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Those  of  us  Involved  In  the  creation  of  the  Handbook  of  Artificial  Intelligence,  both 
writers  and  editors,  have  attempted  to  make  the  concepts,  methods,  tools,  and  main  results 
of  artificial  Intelligence  research  accessible  to  a  broad  scientific  and  engineering  audience. 
Currently.  A»  work  is  familiar  mainly  to  Its  practicing  specialists  and  other  interested 
computer  scientists.  Yet  the  field  is  of  growing  Interdisciplinary  Interest  and  practical 
Importance.  With  this  book  we  are  trying  to  build  bridges  that  are  easily  crossed  by 
engineers,  scientists  In  other  fields,  and  our  own  computer  science  colleagues. 

In  the  Handbook  we  Intend  to  cover  the  breadth  and  depth  of  Al,  presenting  general 
overviews  of  the  scientific  Issues,  as  well  as  detailed  discussions  of  particular  techniques 
and  Important  Al  systems.  Throughout  we  have  tried  to  keep  In  mind  the  reader  who  is  not  e 
specialist  In  Al. 

As  the  cost  of  computation  continues  to  fall,  new  areas  of  computer  applications 
become  potentially  viable,  for  many  of  these  areas,  there  do  not  exist  mathematical  "cores" 
to  structure  calculations!  use  of  the  computer.  Such  areas  will  Inevitably  be  served  by 
symbolic  models  and  symbolic  Inference  techniques.  Yet  those  who  understand  symbolic 
computation  have  been  speaking  largely  to  themselves  for  twenty  years.  We  feel  that  It  is 
urgent  for  Al  to  "go  public*  In  the  manner  intended  by  the  Handbook. 

Several  other  writers  have  recognized  a  need  for  more  widespread  knowledge  of  Al 
and  have  attempted  to  help  fill  the  vacuum,  lay  reviews.  In  particular  Margaret  Boden's 
Artificial  Intelligence  and  Natural  Man.  have  tried  to  explain  what  is  important  and 
interesting  about  Al,  and  how  research  in  Al  progresses  through  our  programs.  In  addition, 
there  are  a  few  textbooks  that  attempt  to  present  a  more  detailed  view  of  selected  areas 
of  Al,  for  the  serious  student  of  computer  science.  But  no  textbook  can  hope  to  describe  all 
of  the  sub-areas,  to  present  brief  explanations  of  the  Important  ideas  and  techniques,  and  to 
review  the  forty  or  fifty  most  Important  Al  systems. 

The  Handbook  contains  several  different  types  of  articles  key  Al  ideas  and  techniques 
are  described  In  core  articles  (e  g  .  basic  concepts  In  heuristic  search,  semantic  nets). 
Important  individual  Al  programs  (eg.  SMRDLU)  are  described  In  separate  articles  that 
indicate,  among  other  things,  the  designer's  goal,  the  techniques  employed,  and  the  reasons 
why  tho  program  Is  Important.  Overview  articles  discuss  the  problems  and  approaches  in 
each  major  area.  Tho  overview  articles  should  bo  particularly  useful  to  those  who  seek  a 
summary  of  the  underlying  issues  that  motivate  Al  research. 

Eventually  the  Handbook  will  contain  approximately  two  hundred  articles.  We  hope  that 
the  appearance  of  this  material  will  stimulate  Interaction  and  cooperation  with  other  Al 
research  sites.  We  look  forward  to  being  advised  of  errors  of  omission  and  commission.  For  a 
field  as  fast  moving  as  Al,  It  Is  important  that  its  practitioners  alert  us  to  important 
developments,  so  that  future  editions  will  reflect  this  new  material.  We  intend  that  the 
Handbook  of  Artificial  Intelligence  be  a  living  and  changing  reference  work. 

The  articles  In  this  edition  of  the  Handbook  were  written  primarily  by  graduate  students 
in  Al  at  Stanford  University,  with  assistance  from  graduate  students  and  Al  professionals  at 
other  institutions.  We  wish  particularly  to  acknowledge  the  help  from  those  at  Rutgers 
University,  SRI  International,  Xerox  Palo  Alto  Research  Center,  MIT,  and  the  RAND 
Corporation. 


The  authors  of  this  chapter  on  Automatic  Programming  research  are  Robert  Elschlager 
and  Jorge  Phillips.  They  have  worked  from  material  supplied  by  the  AP  researchers 

•hemselves ,  including  David  Baratov,  Cordell  Jreen,  Neil  Coldr.an,  Ceorge  Heidora, 
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Foreword 


Those  of  us  Involved  in  the  creation  of  the  Handbook  of  Artificial  Intelligence,  both 
writers  and  editors,  have  attempted  to  make  the  concepts,  methods,  tools,  and  main  results 
of  artificial  Intelligence  research  accessible  to  a  broad  scientific  and  engineering  audience. 
Currently,  Al  work  Is  familiar  mainly  to  its  practicing  specialists  and  other  interested 
computer  scientists.  Yet  the  field  is  of  growing  interdisciplinary  Interest  and  practical 
Importance.  With  this  book  we  are  trying  to  build  bridges  that  are  easily  crossed  by 
engineers,  scientists  in  other  fields,  and  our  own  computar  science  colleagues. 

In  the  Handbook  we  intend  to  cover  the  breadth  and  depth  of  Al,  presenting  general 
overviews  of  the  scientific  issues,  as  wall  as  detailed  discussions  of  particular  techniques 
and  Important  Al  systems.  Throughout  we  have  tried  to  keep  In  mind  the  reader  who  is  not  a 
apecialist  In  Al. 

As  the  cost  of  computation  continues  to  fall,  now  areas  of  computer  applications 
become  potentially  viable.  For  many  of  these  areas,  there  do  not  exist  mathematical  "cores" 
to  structure  calculations!  use  of  the  computer.  Such  areas  will  inevitably  be  served  by 
symbolic  models  and  symbolic  Inference  techniques.  Yet  those  who  understand  symbolic 
computation  have  been  speaking  largely  to  themselves  for  twenty  years.  We  feel  that  it  is 
urgent  for  Al  to  "go  public"  In  the  manner  intended  by  the  Handbook. 

Several  other  writers  have  recognized  a  need  for  more  widespread  knowledge  of  Al 
and  have  attempted  to  help  Ml  the  vacuum.  Lay  reviews,  m  particular  Margaret  Boden's 
Artificial  Intelligence  and  Natural  Man.  have  tried  to  explain  what  is  important  and 
Interesting  about  Al,  and  how  research  m  Al  progresses  through  our  programs.  In  addition, 
there  are  a  few  textbooks  that  attempt  to  present  a  more  detailed  view  of  selected  areas 
of  Al.  for  the  serious  student  of  computer  science  But  no  textbook  can  hope  to  describe  all 
of  the  sub-areas,  to  prasent  brief  explanations  of  the  Important  ideaa  and  techniques,  and  to 
review  the  forty  or  fifty  most  Important  Al  systems. 

The  Handbook  contains  several  different  types  of  articles.  Key  Al  Ideaa  and  techniques 
are  described  In  core  articles  (e  g-,  basic  concepts  In  heuristic  search,  aemantic  nets). 
Important  Individual  Al  programs  (e.g.,  SHRDIU)  ara  described  in  separate  articles  that 
indicate,  among  othar  things,  the  designer's  goal,  the  techniques  employed,  and  the  reasons 
why  the  program  la  Important.  Overview  articles  discuss  the  problems  and  approaches  in 
each  major  area.  The  overview  articles  should  be  particularly  useful  to  those  who  seek  a 
summary  of  the  underlying  Issues  that  amttvate  Al  research. 


Eventually  the  Handbook  will  contain  approximately  two  hundred  articles.  We  hope  that 
the  appearance  of  this  material  will  stimulate  Interaction  and  cooperation  with  other  Al 
research  sites.  We  look  forward  to  being  advised  of  errors  of  omission  and  commission.  For  a 
field  as  fast  moving  as  Al.  It  Is  important  that  its  practitioners  alert  us  to  Important 
developments,  so  that  future  editions  wiM  reflect  this  new  material.  We  intend  that  the 
Handbook  of  Artificial  Intelligence  be  a  living  and  changing  reference  work. 

The  articles  In  this  edition  of  the  Handbook  were  written  primarily  by  graduate  students 
in  Al  at  Stanford  University,  with  assistance  from  graduate  students  and  Al  professionals  at 
other  institutions.  We  wish  particularly  to  acknowledge  the  help  from  those  at  Rutgers 
University,  SRI  International,  Xerox  Palo  Alto  Research  Center,  MIT,  and  the  RANG 
Corporation. 


The  authors  of  this  chapter  on  Automatic  Programming  research  are  Robert  Eischlager 
and  Jorge  Phillips.  They  have  worked  from  material  supplied  by  the  AP  researchers 

..nur.se Ives ,  including  Kavid  Rarstow,  Cordell  Green,  Neil  Goldman ,  George  Heiaorn 
.-.lai-'.e  -'.an  •. ,  ..ohar  Manna,  Brian  Me  Curie ,  Gregory  Ruth,  Richard  Waldinger,  and 

Richard  Waters. 


Avron  Barr 
Edward  Feigenbaum 


Stanford  University 
July,  tore 
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Automatic  Programming  (AP)  Is  a  new,  dynamic,  and  not  precisely  defined  area  of 
artificial  Intelligence.  This  overview  discusses  the  definitions,  history,  motivating  forces  and 
goals  of  automatic  programming  and  Includes  a  brief  description  of  the  basic  characteristics 
and  central  Issues  of  AP  systems.  The  article  begins  with  a  section  discussing  the  various 
possible  definitions  of  automatic  programming,  the  background  In  which  It  has  achieved 
existence,  as  well  as  some  of  Its  general  motivating  forces  and  goals.  The  next  section 
describes  four  characteristics  of  all  AP  systems:  the  method  by  which  a  user  of  such  a 
system  specifies  or  describes  the  desired  program,  the  target  language  in  which  the  system 
writes  the  program,  the  problem  or  application  area  to  which  the  system  Is  addressed,  and 
the  approach  or  operational  method  employed  by  the  system.  Next,  a  section  discusses  four 
basic  Issues,  one  or  more  of  which  concern  all  AP  systems:  the  representation  and 
processing  of  partial  or  Incomplete  information;  the  transformation  of  structures,  and 
especially  the  transformation  of  program  descriptions  into  other  descriptions  (in  this  chapter, 
the  term  program  description  Includes  the  user's  specification  of  the  desired  program,  any 
Internal  representations  of  the  program,  as  well  as  the  target  language  Implementation);  the 
efficiency  of  the  target  language  Implementation;  and  the  system's  capabilities  for  aiding  in 
the  understanding  of  the  program^ollowing  this  overview,  the  reader  will  find  articles  on  the 
methods  of  specifying  programs  mSAP  systems,  on  some  of  the  basic  operational  methods 
employed  In  such  systems,  and  then  eight  articles  describing  most  of  the  major  AP  projects. 


Definition 

The  bulk  of  the  research  in  AP  has  appeared  in  the  1970s,  and  It  is  not  surprising  that 
there  Is  lack  of  agreement  as  to  the  definition,  scope,  and  direction  of  the  endeavor 
Several  brief  definitions  of  automatic  programming  have  been  suggested  in  the  literature,  but 
considering  the  newness  of  the  ares  one  should  not  expect  these  definitions  to  be  precise 
One  definition  says  simply  that  AP  Is  something  that  will  save  people  the  chores  of 
programming  (Biermann,  1976a).  Another  states  that  an  AP  system  carries  out  part  of  the 
programming  activity  currently  performed  by  a  human  in  constructing,  a  program  written  In 
some  machli  e  executable  language,  given  the  definition  of  the  problem  to  be  solved;  here, 
the  essence  of  an  AP  system  is  that  it  assumes  some  responsibilities  otherwise  borne  by  a 
human,  and  thereby  reduces  the  person's  task  (Hammer  &  Ruth,  1979).  Yet  another  states 
that  AP  means  having  the  computer  help  write  Its  own  programs  (Heldorn,  1977).  AP  is  the 
application  of  a  computing  system  to  the  problem  of  effectively  utilizing  that  or  another 
computing  system  In  the  performance  of  a  task  specified  by  the  user  (Balzer,  1973b). 

To  summarize,  perhaps  we  can  define  AP  here  as  an  automation  of  some  part  of  the 
program-writing  activities  that  currently  are  typically  performed  by  people  and  not  yet 
performed  by  machine.  Therefore  the  definition  excludes  such  systems  and  software 
environments  as  assembly  languages  and  high-level  languages  such  as  FORTRAN,  COBOL 
PL/1,  ALGOL,  or  LISP;  and  such  programming  aids  as  symbol  tables,  cross  reference 
generators,  text  editors,  and  debugging  systems. 

Other  more  extensive  definitions  have  been  suggested.  One  definition  (Balzer,  1973b) 
"rates"  AP  systems  according  to  a  measure  of  merit,  which  includes  the  following  factors: 

(a)  the  e mount  of  time  and  effort  needed  by  the  programmer  to  formulate  and 
specify  the  desired  program; 
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(b)  the  efficiency  of  the  decisions  made  by  the  system  in  designing  the  program, 

and  consequently  the  overall  efficiency  of  the  program  that  is  produced  by 
the  system; 

(c)  the  ease  with  which  future  modifications  can  be  incorporated  in  the  progrem, 

(d)  the  reliability  and  ruggedness  of  the  program; 

(e)  the  amount  of  computer  resources,  including  time  and  memory,  used  by  the 

system  to  produce  that  program;  and 

(f)  the  range,  as  well  as  the  complexity,  of  the  tasks  that  can  be  handled  by  the 

system. 


Notice  that,  according  to  such  a  measure,  a  FORTRAN  language  compiler  would  renk  as 
an  AP  system.  However,  Its  rank  would  be  significantly  less  than  the  potential  of  current  AP 
research  projects 

Another  source  (see  article  D3)  lists  some  specific  factors  that  bear  on  factor  (a) 
above,  the  factor  concerned  with  the  effort  required  of  the  programmer.  The  specific 
factors  are  Informality,  language  level,  and  executability.  An  AP  system  is  Informal  to  the 
degree  that  the  user  can  be  ambiguous  (various  Interpretations  of  the  specification  are 
possible)  and  partial  or  Incomplete  (pieces  of  information.  Including  perhaps  information  about 
referencing  and  sequencing,  have  been  omitted)  Language  level  refers  to  the  degree  to 
which  the  AP  system  can  accept  specifications  in  a  terminology  natural  to  the  problem  area 
under  consideration  Executability  refers  to  the  degree  to  which  the  system  can  achieve  a 
desired  program  state  on  the  basis  of  a  description  of  that  state,  that  Is.  the  degree  to 
which  the  user  need  only  specify  what  Is  wanted  rather  that  how  to  obtain  It 

Another  definition  of  AP  can  be  obtained  by  defining  the  development  phases  of  a 
software  system  (software  development  refers  to  the  creation  of  a  program  or  collection  of 
programs,  from  their  Inception  to  the  completed  product).  On  this  basis.  It  would  follow  that 
AP  assists  tho  programmer  with  one  or  more  of  these  phases.  For  example,  in  a  later  article 
that  describes  the  PROTOSVSTEM  research  project  07,  the  development  of  date-processing 
systems  (programs)  Is  seen  as  passing  through  five  phases.  First,  the  programming  problem 
Is  defined  by  clearly  identifying  and  understanding  what  the  desired  software  is  to 
accomplish;  second,  what  the  program  Is  to  do  in  order  to  alleviate  this  problem  Is  clearly  and 
precisely  determined;  third,  the  organisation,  flow  of  control,  and  data  representations  ere 
selected  from  standard  Implementation  possibilities;  fourth,  this  very  high-level  specification 
in  terms  of  standard  implementations  Is  transformed  Into  code  In  some  high-level  language; 
and  fifth,  this  code  Is  compiled. 

These,  then,  are  some  of  the  more  detailed  definitions  that  have  been  presented  for 
AP  Altogether,  they  define  a  somewhat  amorphous  direction  of  research;  there  Is  still  no 
widespread  agreement  as  to  exactly  what  constitutes  AP. 
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Background 

The  present  period  is  not  the  first  time  the  term  automatic  programming  has  been  used. 
The  term  was  employed  once  before,  about  twenty  years  ago,  to  mean  writing  a  program  in  a 
high-level  language  (e  g..  FORTRAN)  and  having  a  compiler  transform  the  program  into  machine 
language  code  Thus,  one  finds  "Automatic  Coding."  Franklin  Institute.  January  1957  (see 
Automatic  Coding,  1957),  or  T  fie  Annual  fitvttw  of  Automatic  Programming,  first  appearinq  in 
1959  (see  The  Annual  Review  in  Automatic  Programming.  1960).  At  that  time,  when  "real" 
programming  referred  to  writing  a  program  in  machine  or  assembly  language,  AP  meant  writing 
a  program  In  FORTRAN.  Today,  when  most  programming  Is  done  in  high-level  languages,  AP 
means  programming  In  a  software  environment  much  more  advanced  than  the  ones  created  by 
these  high-level  languages 

Though  the  early  meaning  of  the  term  automatic  programming  differs  from  the  current 
meaning,  nevertheless,  at  both  times  AP  meant  assisting  and  automating  the  process  of 
writing  programs. 

In  a  general  way.  the  forces  responsible  for  AP  twenty  years  ago  are  similar  to  those 
responsible  for  Its  appearance  today.  At  both  times  there  was  a  feeling  that  programmers 
were  burdened  with  the  need  to  specify  many  details,  with  the  need  to  keep  track  of  the 
many  relations  between  these  details,  and  with  a  programming  environment  that  was  not. 
perhaps,  natural  to  the  way  In  which  they  thought  about  the  problem.  At  both  times  there 
was  a  feeling  among  some  that  new  programming  environments  might  be  within  grasp  (twenty 
years  ago  the  new  environments  were  high-level  languages)  and  that  the  software 
technologies  required  to  realize  such  environments  might  be  feasible.  Out  of  the  desire  for 
new  programming  environments  and  out  of  the  feeling  that  these  new  environments  might  be 
attainable,  there  appeared,  in  each  period,  an  endeavor  called  AP. 

The  current  motivations  for  AP,  while  similar  to  those  twenty  years  ago.  are  more 
intense  Today  software  is  costly  and  unreliable.  Much  time,  money,  and  effort  is  cum  )<'v 
being  expended,  with  even  greater  expenditures  forecast  for  the  future.  Software  is  seM  hti 
produced  within  budget  or  on  time.  Quite  often  the  supposedly  finished  product,  when 
delivered,  fails  to  meet  specifications.  As  programming  applications  of  increasingly  greater 
complexity  are  addressed,  not  only  does  reliability  become  more  difficult  to  attain,  but  the 
costs  of  software,  in  terms  of  time,  money,  and  effort,  spiral  upward. 

To  help  alleviate  these  problems,  AP  aims  at  a  general  goal:  To  restyle  the  way  in  which 
the  programmer  specifies  the  desired  program.  This  restyling  should  allow  the  programmer  to 
think  of  the  problem  at  a  higher  and  more  natural  level.  AP  would  like  to  relieve  the 
programmer  of  mundane  portions  of  programming,  that  is.  the  need  to  keep  track  of  inordinate 
amounts  of  details.  By  changing  the  programming  environment.  AP  could  allow  programmers  to 
construct,  with  greater  ease  and  with  greater  accuracy,  the  programs  of  the  present  and  the 
more  complex  programs  of  the  future. 

This  goal  circles  back  to  a  succinct  definition  of  AP:  The  computer  is  used  as  a  toot 
that  automates  part  of  the  programming  process.  That  Is.  the  computer  performs  a  portion  of 
the  program-writing  activities.  Neither  the  goal  nor  this  definition  are  especially  precise,  but 
the  next  sections  are  more  specific.  They  describe  the  common  characteristics  and  primary 
issues  of  AP  systems. .mark  Characteristics  of  AP  Systems 
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All  AP  systems  have  a  specification  method,  a  target  language,  a  problem  area,  and  an 
approach  or  method  of  operation. 

Specification  method  Users  of  an  AP  system  must  be  given  some  means  or  method  for 
conveying  to  the  system  the  program  that  they  desire  This  means  Is  referred  to  as  the 
t  ptiifltation  mrMtvf  of  the  AP  system  As  will  be  seen  In  the  remainder  of  this  chapter,  AP 
systems  possess  a  variety  of  specification  methods.  Formal  specification  methods  are  those 
that  might  bo  considered  to  be  v*ry  programming  languages  In  general,  the  syntax 

and  semantics  of  such  methods  are  precisely  and  definitely  defined.  Formal  methods  also 
tend  to  be  complttt,  that  is.  the  specification  will  completely  and  precisely  indicate  what  it  Is 
that  the  program  Is  to  accomplish,  though,  of  course,  the  specification  may  not  indicate  the 
form  of  tho  program  or  how  the  program  is  to  accomplish  It  On  the  one  hand,  many  formal 
specification  methods  are  not  usually  inttractiM.  which  Is  to  say,  the  system  does  not 
Interact  with  the  user  in  order  to  obtain  missing  information,  to  verify  hypotheses,  or  to  point 
out  inconsistencies  in  the  specification  For  example,  it  Is  comparable  to  the  passive 
acceptance  of  a  program's  specification  by  a  compiler  of  a  high-level  language  (e  g  , 
FORTRAN)  On  the  other  hand,  there  are  some  formal  specification  methods  that  are 
Interactive  (see  McCune.  1078,  which  emphasizes  interactive  formal  specification 
techniques  ns  a  natural  extension  of  Incremental  compiling) 

A  different  method  of  specification  Is  by  examples  Here  the  user  specifies  the  desired 
program  by  simply  giving  examples  of  what  tho  desired  program  Is  to  do;  the  AP  system 
would  then  construct  the  desired  program  The  specification  might  consist  of  examples  of 
the  input/output  behavior  of  the  desired  program,  or  It  might  consist  of  traces  of  the  desired 
program’s  behavior  (a  trace  is  an  example  showing  how  the  program  should  process  a  given 
input)  Specification  by  examples  (or  traces)  is  certainly  not  complete:  The  examples  do  not 
fully  describe  in  all  cases  the  behavior  of  the  desired  program 

Natural  language  (o  g  ,  tnglish)  is  another  method  of  specification  The  user  specifies 
In  natural  language  what  the  desired  program  is  to  do  This  method  is  often  Interactive  (cf 
articles  on  PSI  and  NGPS).  checking  hypotheses,  pointing  out  Inconsistencies,  and  asking  for 
further  Information 

A  more  detailed  discussion  of  specification.  Including  some  advantages  and 
disadvantages  of  the  various  methods,  is  presented  in  the  article  on  program  specification 
F  samples  of  program  specification  are  found  in  most  of  the  remaining  articles  of  this  chapter 

Target  language  The  specification  method  refers  to  the  Input  to  the  AP  system,  and 
the  target  language  Is  concerned  with  the  system's  output  of  the  finished  program  The 
language  in  which  the  AP  writes  the  finished  program,  or  parts  of  the  finished  program.  Is 
called  tho  largrt  /aiguagi  The  target  languages  of  the  AP  systems  described  In  this  chapter 
are  high-level  languages  such  as  LISP.  Pl/1.  or  GPSS  As  an  example,  suppose  that  the 
target  language  of  an  AP  system  were  LISP  The  user,  possibly  employing  a  very  high-level 
language,  or  examples,  or  natural  language,  would  specify  to  the  AP  system  what  the  desired 
program  Is  to  do.  Then  the  AP  system  would  eventually  output  a  LISP  program  to  do  just  that. 

it  is  possible  to  view  specification  method  and  target  language  as  relative  terms  In  en 
AP  system  that  carries  the  process  of  writing  programs  through  several  phases,  the  Input 
language  for  each  phase  could  be  thought  of  as  a  specification  method,  and  the  output 
specification  as  being  written  in  a  target  language,  which  then  becomes  the  input 
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specification  method  to  the  next  phase.  However,  in  this  chapter,  target  language  is  usually 
ra served  for  the  language  In  which  the  output  program  of  the  whole  AP  system  is  written 

Problem  area  Another  characteristic  of  an  AP  system  Is  its  problem  area  or  area  of 
Intended  application.  Problem  area,  problem  domain,  application  area,  and  application  domain 
are  synonomous  terms.  For  some  AP  systems,  the  scope  of  Its  problem  area  Is  relatively 
precise,  for  example,  the  problem  area  of  the  NIPQ  system  Is  simple  queuing  problems,  while 
the  problem  area  of  the  PROTOSYSTEM  project  is  input/output  intensive  data-processinq 
systems.  Including  inventory  control,  payroll,  and  other  record-keeping  systems  On  the  other 
hand,  the  problem  area  of  Sume  AP  systems  can  be  relatively  large;  the  application  domain  of 
the  PSI  system  Is  symbolic  computation.  Including  list  processing,  searching  and  sorting,  data 
storage  and  retrieval,  and  concept  formation.  The  problem  area  of  a  system  can  have  a 
bearing  on  tho  method  of  specification,  introducing  relevant  terminology.  Influencing  the 
method  of  operation  or  approach  used  by  the  AP  system,  etc. 

Method  of  operation  The  fourth  characteristic  of  AP  systems  is  the  approach  or 
mothod  of  operation  AP  Is  too  new  for  there  to  be  very  many  clear-cut  categories  of 
methods  of  operation  Tho  approach(es)  of  most  systems  Is  not  easily  categorized  A 
separate  article  on  basic  approaches  discusses  some  of  the  more  clear-cut  methods. 
Including  theorem  proving,  program  formation,  knowledge  engineering,  autometlc  data 
selection,  traditional  problem  solving,  and  Induction. 

In  tho  theorem-proving  approach,  the  user  specifies  the  conditions  that  must  hold  for 
the  Input  data  (to  the  desired  program)  and  the  conditions  that  the  output  data  should 
satisfy  The  conditions  are  specified  in  some  formal  language,  often  the  predicate  calculus.  A 
theorem  prover  Is  then  asked  to  prove  that,  for  all  given  inputs,  there  exists  an  output  that 
satisfies  the  output  conditions  The  proof,  then,  yields  a  program  The  desired  program  can 
be  extracted  from  the  proof 

The  program  transformation  approach  refers  to  transforming  a  specification  or 
description  of  a  program  Into  an  equivalent  description  of  the  program  The  reason  for  the 
transformation  might  be  to  convert  a  specification  that  can  be  easily  written  and  read  Into 
one  that  is  more  complicated  but  more  efficient,  alternatively,  the  goal  might  be  to  convorl  a 
very  high-level  description  of  the  program  into  a  description  closer  to  a  target  language 
Implementation 

Knowledge  engineering  (see  Applications  chapter),  applicable  to  many  areas  in  addition 
to  AP,  refers  to  Identifying  and  explicating  knimrledgn;  and  It  often  means  “realizing"  thn 
knowledge  as  specific  rules  that  can  be  added  to  or  removed  from  the  kncwltdgt  bait  of  a 
system. 

Traditional  problem  solving  (see  section  Search),  also  applicable  to  many  areas,  refers 
to  the  use  of  goals  to  direct  tho  choice  and  application  of  a  set  of  operators. 

These  approaches  or  paradigms  overlap,  and  many  systems  utilize  a  method  that  may. 
In  part,  draw  on  elements  from  several.  While  It  la  hard  to  categorize  the  approaches  of  AP 
systems,  there  are  now  enough  aysteais  so  that  It  la  possible  to  Identify  some  common 
issues,  snd  these  are  the  topic  of  the  next  section. 
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Basic  Issuaa 

In  the  article  on  basic  approaches  and  In  all  the  articles  describing  the  Individual 
research  projects,  the  reader  will  find  one  or  more  of  several  explicit  basic  Issues 
addressed  partial  Information,  transformation,  efficiency,  and  understanding 

Partial  information  Partial  information  pertains  to  systems  whose  methods  of 

specification  allow  for  partial  or  fragmentary  descriptions  of  the  desired  program:  Not  all  of 
tho  required  Information  Is  present  In  the  specification,  or,  where  It  Is  present,  It  may  not  be 
explicit  Since  the  problem  of  partial  Information  does  not  apply  to  systems  that  have 
complote  methods  of  specification,  systems  such  as  DfDALUS,  PROTOSYSTEM  I,  LIBRA,  and 
PECOS  are  not  concerned  with  this  problem  On  the  other  hand,  systems  that  accept 

Incomplete  specifications,  especially  natural  language  specifications,  are  very  much 

concerned  with  partial  Information  The  Nl  PQ.  PSI.  and  SAFE  systems  fall  In  this  category  A 
classification  of  the  different  kinds  of  missing  information  that  might  occur  in  a  natural 
language  specification  Is  given  In  the  SAFE  article 

Usually  going  hand  In  hand  with  the  problem  of  partial  information  Is  the  problem  of 
consistency.  Incomplete  methods  of  specification  often  permit  inconsistency  between 
different  parts  of  the  same  specification.  In  such  cases,  the  system  must  check  for 
Inconsistencies  and,  If  they  are  found,  resolve  them 

In  trying  to  fill  In  missing  information  in  one  part  of  the  specification  or  checking  for 
consistency  between  different  parts  and  rosolvinq  any  discovered  inconsistency,  the  system 
may  use  Information  that  occurs  either  explicitly  or  Implicitly  In  other  parts  of  tho 

specification  Also,  It  might  utilize  a  knowledge  base  containing  Information  about  the  problem 
area  Finally,  the  system  may  consult  the  user  in  an  attempt  to  gain  the  sought-for 
information  One  of  the  explicit  devices  for  utilizing  such  information  Is  constraints.  For 
examples  of  these,  see  the  article  on  PSI  and  especially  the  article  on  SAFE. 

Transformation  Another  Issue  addressed  by  AP  systems  Is  transformation  The  term 
refers,  simply,  to  transforming  a  program  description,  or  part  of  a  program  description,  into 
another  form  All  AP  systems  use  transformation,  if  only  to  transform  an  Internal  description  of 
the  program  into  a  target  language  implementation  (description)  Even  a  compiler  of  high- 
level  languages  (eg,  FORTRAN.  Pi /I,  Al  GOt )  will  often  transform  a  program  description 
several  times,  taking  It  through  several  Internal  representations,  the  last  of  which  is  the 
machine  language  description  However,  a  compiler  differs  from  an  AP  system  In  that  it 
applies  the  transformations  In  a  rigid,  predetermined  manner;  In  an  automatic  programming 
system  there  might  be  no  predetermined  way  to  apply  the  transformations,  the  application 
depending  on  an  analysis  and  exploration  or  the  results  of  applying  various  transformations 
Systems,  such  as  DEDAL  US  and  PECOS,  that  use  extensive  transformation  on  the  program 
description  have  a  knowledge  base  containing  many  transformation  rules  that  convert  parts 
of  a  higher  level  description  Into  a  lower  level  description,  closer  to  a  target  language 
implementation  Such  rules  ere  repeatedly  applied  to  parts  of  the  program  description  with 
the  goal  of  eventually  producing  descriptions  within  the  target  language.  These  systems 
develop  a  tree  of  possible  descriptions  of  the  program,  with  each  descendant  of  a  node 
being  the  result  of  a  transformation.  One  of  the  goals,  then.  In  developing  the  tree  la  to  find  a 
description  that  Is  a  target  language  Implementation  of  the  desired  program.  Another  goal 
might  be  to  find  an  efficient  target  language  implementation. 
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Other  AP  systems  may  use  transformation  rules  in  various  ways.  For  instance,  the  NLPQ 
system  uses  transformation  rules  to  parse  the  natural  language  Input  from  the  user,  to 
generate  natural  language  output  to  the  user,  and  to  generata  the  target  language  program 
from  an  internal  description. 

Efficiency  Another  concern  of  AP  systems  is  the  efficiency  of  the  target  language 
Implementation.  The  two  projects  that  dealt  with  this  issue  are  PROTOSYSTEM  I  and  the  PSI 
subsystem  LIBRA.  While  the  PROTOSYSTEM  approach  to  creating  efficient  programs  combines 
artificial  Intelligence  with  the  mathematical  technique  of  dynamic  programming,  the  LIBRA 
approach  uses  a  more  extensive  range  of  artificial  intelligence  techniques,  employing  a 
variety  of  heuristics,  estimates,  and  kinds  of  knowledge  to  guide  its  search  for  an  efficient 
program. 

When  It  is  said  that  an  AP  system  optimizes  a  program  for  efficiency,  it  does  not  mean 
that  the  system  finds  the  absolutely  most  efficient  implementation;  combinatorial  explosion 
makes  such  a  task  impossible  Instead,  optimizing  means  making  some  reasonable  choices  in 
the  Implementation  so  as  to  achieve  a  reasonably  efficient  program. 

Understanding  The  basic  concern  of  one  of  the  systems  below,  PROGRAMMER'S 
APPRENTICE,  pertain  more  to  ‘‘understanding*  the  program  than  it  does  to  the  basic  concerns 
of  partial  information,  transformation,  or  efficiency.  In  this  situation,  understanding  a  program 
might  be  dofmed  as  that  which  enables  a  system  to  talk  about,  analyze,  modify,  or  write 
parts  of  a  program  It  Is  the  Intention  of  the  PROGRAMMER'S  APPRENTICE,  though  it  should  be 
kept  In  mind  that  at  present  this  system  is  not  yet  operational,  to  realize  program 

understanding  through  the  explicit  use  of  plans.  A  plan  represents  one  particular 

understanding  or  way  of  viewing  a  program,  or  part  of  a  program  (for  a  more  detailed 

explanation,  see  the  article  on  PROGRAMMER'S  APPRENTICE).  Understanding  in  the  other 
systems  is  relatively  implicit  and  does  not  reside  in  any  one  particular  class  of 

structures 


Overview  of  the  Tv-stems  Articles 

The  projects  described  in  the  system  articles  cover  much  of  the  current  research  in 
AP,  Including  the  four  basic  issues  Just  discussed  transformation  rules,  search  for  efficiency, 
handling  partial  i-iformation,  and  explicit  understanding. 

The  NIPQ  system  is  the  first  AP  system  to  utilize  natural  language  dialogue  es  a 
specification  method.  The  user  specifies  part  of  a  simple  queuing  simulation  problem  in 
English,  and  then  the  system,  as  is  necessary,  answers  questions  posed  by  the  user,  as  well 
as  queries  the  user  In  order  to  complete  missing  information  or  to  resolve  Inconsistencies. 
The  partial  knowledge  that  the  system  has  obtained  about  the  desired  program  is 
represented  as  a  semantic  net  that  is  eventually  used  to  generate  the  program  in  the  target 
language  GPSS.  Transformation  or  production  rules  analyze  the  user's  natural  language 
specification,  build  and  modify  the  semantic  net,  produce  natural  language  responses,  end 
finally  generata  the  target  language  program. 

The  PSI  system  is  more  recent  end  consists  of  many  subsystems;  It  stresses  the 
Integration  of  a  number  of  different  processes  and  sources  of  knowledge.  The  problem 
application  area  Is  symbolic  programming.  Including  Information  retrieval,  simple  sorts,  and 
concept  formation.  The  user  can  specify  the  desired  program  with  a  mixture  of  examples 
and  mixed-initiative  natural  language  dialogue;  for  an  easier  and  more  natural  interaction  with 
the  user,  the  system  maintains  and  utilizes  a  tree  of  the  topics  that  occur  during  the 
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specification  dialogue  Through  such  a  dialogue,  PSI  creates  a  complete,  consistent 
description  of  the  desired  program.  In  the  last  phase,  the  system  explores  repeated 
application  of  transformation  rules  in  order  to  convert  the  description  into  a  target  language 
implementation.  This  last  phase,  the  synthesis  phase,  is  carried  out  by  two  subsystems: 
PECOS  provides  suitable  transformation  rules  and  LIBRA  directs  and  explores  the  application 
of  the  rules,  with  the  goal  of  obtaining  an  efficient  target  implementation.  PECOS  and  llftHA 
are  described  in  separate  articles 

Both  PECOS  and  DEOALUS  are  examples  of  full-fledged,  dynamic  transformation 
systems  They  each  start  out  with  a  complete  specification  of  the  desired  program  Each  has 
a  knowledge  base  of  many  transformation  rules  that  are  repeatedly  applied  to  the 
specification  These  repeated  applications  produce  a  sequence  of  specifications  that 
eventually  terminate  with  a  specification  that  is  a  target  language  implementation  Because 
more  than  one  transformation  rule  can  apply  in  some  cases,  each  system  actually  develops  a 
tree  of  specifications  (descriptions),  with  eventually  one  or  more  of  the  final  nodes  of  the 
tree  being  a  program  implementation  within  the  target  language  Part  of  the  differences 
between  these  two  systems  lies  in  the  fact  that  DEOALUS  is  concerned  with  the  logic  of 
such  programming  concepts  as  recursion  and  subroutine  On  the  other  hand.  PECOS  is  more 
concerned  with  the  multiplicity  of  implementations  of  very  high-level  programming  constructs 
and  operations,  because  that  is  its  task  within  the  PSI  system  Though  PECOS  stresses 
knowledge  of  various  implementations  and  DEOALUS  stresses  knowledge  of  programming 
constructs,  both  are  systems  where  transformation  is  the  primary  emphasis. 

The  SAFE  system  article  contains  an  extensive  description  of  constraints  and  their  use 
In  handling  partial  Information  SAf  F  processes  a  variety  of  different  kinds  of  constraints,  in 
order  to  fill  in  different  kinds  of  information  in  the  specification  of  the  desired  program,  and 
employs  different  methods  of  processing  these  constraints  Thera  are  constraints  related  to 
type  of  ob|ect  referenced  in  the  specification,  as  well  as  related  to  sequencing  of  steps 
Constraints  are  processed  by  backtracking  and  by  carrying  out  a  form  of  symbolic  execution. 

One  of  the  ideas  of  the  SAFE  project  is  that  a  completely  specified  program  satisfies  a 
very  large  number  of  constraints  Information  In  the  user's  partial,  fragmentary  specification 
(partial  and  fragmentary  since  the  specification  does  not  mention  all  objects  explicitly,  or 
partially  mentions  other  objects  and  may  not  contain  explicit  sequencing  of  actions) 
combined  with  the  many  constraints  that  a  formal  program  satisfies  (and  possibly  with 
information  from  a  knowledge  base  of  the  application  area  or,  in  special  cases,  from 
information  obtained  from  queries  to  the  user),  taken  together,  fully  determine  a  complete 
and  formal  description  of  the  program  No  other  system  deals  m  so  central  a  way  with  partial 
information  and  constraints  as  does  the  SAFE  system 

The  LIBRA  and  PROTOSYSTEM  1  projects  are  concerned  with  efficiency  of  the  target 
language  implementation  LIBRA  uses  an  artificial  intelligence  approach,  while  PROTOSVSTEM 
l  uses  a  combination  of  some  artificial  Intelligence  with  primarily  the  mathematical  approach 
of  dynamic  programming  Dynamic  programming,  modified  by  approximations  and  heuristics, 
produces  an  optimised  target  language  implementation  On  the  other  hand,  LIBRA  guides  the 
application  of  the  transformation  rules  furnished  by  the  PECOS  subsystem  of  PSI  and  directs 
the  growth  of  the  resulting  tree  (see  above  discussion  of  PECOS)  with  the  goal  of  finding  an 
efficient  target  implementation  LIBRA  determines  and  utilises  estimates  ot  whet  it  is  likely  to 
achieve  by  exploring  the  development  of  a  particular  node.  LIBRA  has  knowledge  about  how 
its  own  allocation  of  apace  and  time  should  Influence  Its  strategy  In  searching  lor  an  efficient 
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implementation.  Though  both  LIBRA  and  PROTOSYSTEM  I  are  concerned  with  producing 
efficient  Implementation#,  they  approach  the  problem  m  different  contexts.  The  first 
explores  configurations  of  a  data -processing  program  and  the  second  explores  applications 
of  transformation  rules. 

The  PROGRAMMER'S  APPRENTICE  is  not  necessarily  Intended  to  write  the  program,  but 
instead  to  function  as  an  apprentice  to  the  user,  with  the  user  writing  none.  ywne,  or  all  of 
the  program  and  the  apprentice  assisting  with  such  tasks  as  writing  parts  of  the  program, 
checking  for  consistency,  explaining  pieces  of  program,  and  helping  the  user  modify 
programs.  The  central  concern  of  this  protect  is  under  Handing,  through  the  explicit  device  of 
plant.  A  plan  may  be  thought  of  as  a  template  that  expresses  a  viewpoint.  Matching  the 
plan  to  a  part  of  a  program  description  corresponds  to  understanding  the  part  In  that  way 
Several  plans  can  match  the  same  part  of  a  program,  corresponding  to  different  ways  of 
understanding  that  part.  Plans  can  also  be  built  up  m  a  hierarchical  fashion.  The  goal  is  that 
the  PROGRAMMER'S  APPRENTICE,  with  the  understanding  attained  through  the  use  of  plans, 
can  assist  the  programmer  with  correcting  mistakes,  writing  parts  of  the  program,  and 
effecting  modifications. 

All  of  these  are  research  projects:  At  present  none  has  been  responsible  for  an  AP 
production  system  Much  research  remains  before  most  of  these  systems  can  be  of  use  to 
programmers. 


References 

See  The  Annual  Review  In  Automatic  Programming  (1060),  Automatic  Coding  (1067). 
Balzer  (1073a).  Bal/er  (1073b).  Bal/er  (1073c),  Biermann  (1076a).  Hammer  (1077).  Hammer 
&  Ruth  (1070),  Heldorn  (1076).  Heldorn  (1077).  and  McCune  (1076). 

Further  references  for  specific  research  areas  are  listed  with  the  other  articles  In  this 
chapter. 


10 


Automatic  Programming 


A.  Mathoda  of  Specification 

There  must  be  some  means  or  method  by  which  tha  user  conveys  to  the  A P  system  the 
kind  of  program  that  he  wants.  This  method  is  called  the  program  specification.  It  might  entail 
fully  specifying  the  program  in  soma  formal  programming  language  or  posaibly  just  specifying 
certain  properties  of  the  program.  It  might  involve  giving  examples  of  the  input  and  the 
output  of  the  desired  program,  giving  formal  constraints  on  the  program  In  the  predicate 
calculus,  or  giving  interactive  descriptions  of  the  program  at  increasing  levels  of  detail  m 
English.  (Specification  Is  Introduced  in  general  terms  in  the  overview  article.) 


Formal  Specifications 

One  method  of  formal  specification  Is  that  used  with  the  basic  approach  of  theorem 
proving  (see  below  for  this  basic  approach).  Here  one  might  specify  a  program  as 

(1)  V  si  (P(st)  a  3  s2  Qisl.s?)) 

where  si  are  the  input  variables,  and  s 2  are  the  output  variables.  P(sl)  is  the  input 
predicate  (or  input  specification);  it  gives  the  conditions  that  the  inputs,  si.  can  be 
expected  to  satisfy  at  the  beginning  of  program  execution  Q(s 2)  ia  the  output  predicate 
(specification);  it  gives  the  conditions  that  the  outputs,  s 2,  of  the  desired  program  are 
expected  to  satisfy 

Exprasston  (1)  states  that  for  ail  si,  the  truth  of  P  Implies  there  is  an  s2  such  that 
Q(s1  ,s?)  la  true.  If  thera  arc  no  rastrictions  on  tha  inputs,  ona  may  simply  write 

V  si  3  s2  Q(sl  ,s2)  . 


For  example,  a  program  that  computes  the  greatest  common  divisor  of  two  Integers  x 
and  y  might  be  specified  by  taking  P(x,y)  as  the  condition  that  x  and  y  are  positive,  and 
(Xx.y.z)  as  the  condition  that  z  is  the  greatest  common  divisor.  P(x,y)  could  be  written  as 

x  >  0  and  y  >  0  , 


and  Q(x,y.z)  could  be  written  as 

divide(z.x)  and  dlvide(z.y)  and 

Vr((r>0  and  dlvide(r.x)  and  divide(r.y))  stir)  . 

The  expression 


V  x  y  3  z  (P(x.y)  a  0(x.y.z)) 

would  th<*n  state  that  for  all  positive  integers  x  and  y,  there  Is  a  z  such  that  z  is  their 
greatest  common  divisor 

I 

In  the  basic  approach  for  this  kind  of  specification,  the  above  expression  Is  given  to  a 
theorem  prover  that  produces  a  proof  from  which  a  program  can  be  extracted  (see  beeic 
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approach  of  theorem  proving  below).  One  Is  required  to  give  to  the  theorem  prover  enough 
facta  concerning  any  predicates  and  functions  that  occur  In  P  and  Q  so  that  (1)  is  provable. 
Thus.  In  tha  above,  one  would  have  to  specify  a  number  of  facts  concerning  the  predicates 
"divide”,  "<",  and  "i"  over  tha  integers. 

Another  very  similar  method  of  specification  is  that  used  with  the  basic  approaches  of 
program  transformation  and  of  very  high-level  languages.  This  specification  method  stresses 
the  use  of  entitles  that  are  not  immediately  implementable  on  a  computer,  or  at  least  not 
implementabie  with  some  desired  degree  of  efficiency.  There  Is  considerable  leeway  In  this 
classification  For  Instance,  in  some  program  transformatirn  systems  the  entitles  employed 
may  be  quite  abstract,  without  any  hint  of  the  desired  algorithm.  In  other  systems  the 
algorithm  most  naturally  suggested  by  the  specification  of  the  program  could  be  inefficient, 
but  the  AP  system  will  produce  an  efficient  but  perhaps  convoluted  program. 

One  example  of  a  specification  used  with  program  transformation  is  (see  article  06) 

gcd(x.y)  ►  temputt  max  (/:  divide(z.x)  and  dtvide(z.y)) 

whut  x  and  y  are  nonnegative  integers  greater  than  zero  . 


This  expression  states  that  the  ged  (greatest  common  divisor)  of  x  and  y  is  the 
maximum  of  all  those  2  such  that  z  divides  x  and  y.  Furthermore.  It  Is  assumed  that  x  and  y 
are  nonnegative  Integers  one  of  which  is  nonzero  By  successive  transformations  of  this 
definition  of  ged,  the  system  would  produce  an  efficient  recursive  program.  Another  example 
(Darlington  A  Burstall,  10/3,  p.  280)  Is 

factorlal(x)  :■  If  x«0  then  1  else  times(x,factonal(x-1 ))  . 

The  system,  then,  by  various  transformations  produces  a  more  efficient  nonrecurslve,  though 
more  tortuous,  program. 


Advantages  and  Disadvantages  of  Formal  Specifications 

The  first  specification  method,  that  involving  the  input  and  the  output  predicates  and 
based  on  formal  logic,  is  completely  general:  Anything  can  be  specified.  On  the  other  hand, 
the  user  must  have  a  sufficient  understanding  of  the  desired  behavior  of  the  program  in 
order  to  give  a  full  formal  description  of  the  input  and  output.  This  understanding  can 
sometimes  be  difficult,  even  for  simple  programs.  Also,  the  present  form  of  theorem  provers 
and  problem  reduction  methods  makes  synthesis  of  longer  programs  difficult. 

The  second  type  of  formal  specification  does  not  have  such  arbitrary  generality,  but 
the  terminology  used  in  the  specification  often  is  closer  to  our  way  of  thinking  about  a 
particular  sub)ect,  end  so  It  should  be  easier  to  create  such  specifications. 

Even  though  some  of  the  above  formal  methods  are  arbitrarily  general  and  others  are 
not.  they  all  are  complete:  The  specification  of  the  desired  program  fully  and  completely 
specifies  what  the  program  Is  to  do.  This  is  not  true  of  some  of  the  other  methods  discussed 
below,  where  the  specification  doe*  not  uniquely  determine  what  the  program  is  to  do.  With 
auch  method*  It  becomes  a  concern  whether  the  program  produced  by  the  system  Is  actually 
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what  the  user  desires.  Sometimes  a  system  employing  such  a  method  may  need  to  verify 
whether  the  program  It  produces  Is  the  program  that  the  user  wants.  On  the  other  hand,  with 
the  specification  methods  discussed  here,  there  Is  no  such  problem.  For  further  reading  on 
this  subject,  see  Slbel,  Furbach,  A  Schreibar  (1978). 


Specification  by  Examplas 

Some  simple  programs  are  moat  easily  described  using  examples  of  what  the  program  is 
supposed  to  do. 

Examples  of  Input/output  pairs  In  this  specification  method,  the  user  gives  examples 
of  typical  inputs  and  the  corresponding  outputs  Consider  specifying  or  describing  a 
concatenation  of  lists  to  someone  who  Is  unfamiliar  with  the  term  "concatenation."  It  might 
be  most  straightforward  to  use  an  example: 

concat  [(A  B  C).  (0  E)]  >  (ABCOE)  . 

which  states  that  when  the  Input  of  the  function  "concat"  consists  of  the  two  lists  (ABC) 
and  (D  E).  then  the  corresponding  output  is  (A  8  C  0  E). 

Given  certain  commonsense  assumptions,  this  example  input/output  pair  should  suffice 
to  specify  what  It  is  that  the  desired  program  is  to  do  In  more  complicated  cases,  where  the 
commonsense  assumptions  are  not  sufficient,  more  examples  must  be  given  in  order  to 
specify  the  program  uniquely  For  instance,  the  above  example  could  be  misinterpreted  as  a 
"constant"  program  that  always  gave  (A  B  C  0  E)  as  output: 

concat  [x.y]  »  (A  B  C  0  E)  . 

In  such  a  case,  giving  an  additional  example 

concat  [(l  M),(N  0P)]>(lMN0P)  , 
would  probably  clear  up  any  confusion. 

Another  Instance  of  this  method  is  the  specification  of  the  function  "prime"  by  a  set  of 
input/output  pairs: 

prime(  1 )  ■  1 
pnme(2)  ■ 2 
primoO)  ■  3 
prlme(4)  ■  6 
prtme(6)  ■  7 
prlme(O)  ■  1 1 

Generic  examples  of  Inpwt/output  pairs  In  certain  cases,  generalizations  of  specific 
example*;  or  generic  examples  are  store  useful  m  order  to  avoid  the  problems  inherent  m 
partial  speciflcationa.  For  Instance,  the  generic  exaaipie 


reverse  [(XI  X2  X3  ...  Xn)]  ■  (Xn  ...  X3  X2  XI) 


A 


Method*  of  Specification 


13 


describee  e  list  reversal  function.  Here,  the  X1,X2,...,Xn  are  variables  which  may  be 
anything.  This  specification  is  still  partial  but  is  more  complete  than  any  specification  of  this 
function  given  by  example  of  Input/output  pairs. 

Program  traces  Traces  allow  more  imperative  specifications  than  do  example  pairs.  A 
sorting  program  may  be  specified  with  Input/output  pairs  (e  g.,  Green  at  el.,  1074): 

sort  [(3  1  4  2)]  »  (1  2  3  4)  , 

but  It  would  be  hard  to  specify  an  Insertion  sort  program  in  the  seme  way.  Yet,  e  program 
trace  could  express  such  a  program  as  follows: 

sort  [(3  1  4  2)]  ~>  (  ) 

(1  4  2)  ~>  (3) 

(4  2)  -->  (1  3) 

(2)  ~>  (1  3  4) 

()  ~>  (1  2  3  4) 

Another  example  of  specification  by  traces  might  be 

gcd(12,18)  -> 

(8.12)  -> 

(0.6)  -> 

8 

for  the  specification  by  trace  of  the  Euclidean  algorithm  that  computes  the  greatest  common 
divisor.  An  example  of  using  e  trace  to  specify  part  of  a  concept  formation  program  Is 
presented  In  03. 

More  formally,  a  trace  may  be  defined  as  follows.  A  programming  domain  can  be  thought 
of  es  consisting  of  e  set  of  abstract  objects,  a  set  of  possible  representettons  (celled  dare 
structures)  lor  these  abstract  objects,  a  basic  set  of  operators  to  transform  these 
representettons.  end  e  eless  of  questions  or  predicates  that  can  be  eveluated  on  these  dete 
structures.  A  programming  domain  thus  characterizes  a  class  of  programs  that  might  be 
constructed  to  operate  on  representations  of  the  set  of  abstrect  objects  In  the  domeln.  For 
e  given  program  operating  on  some  data  objects  in  the  domeln,  e  freer  is  a  sequence  of 
changes  of  these  data  structures  and  control  flow  decisions  that  have  caused  these 
changes  during  execution  of  the  program 

Traces  are  usually  expressed  In  terms  of  domain  operators  and  tests  (or  functional 
compositions  of  these).  Traces  ere  classified  as  complete  If  they  carry  aH  Information  about 
operators  applied,  date  structures  changed,  control  decisions  taken,  etc.;  otherwise,  they 
are  called  incomplete.  An  Interesting  subclass  of  the  letter  la  the  dess  of  protocols.  In  which 
all  dete  modifications  are  explicit  but  all  control  Information  (e  g.,  predicate  evaluatione  that 
determine  control  flow)  la  omitted  A  protocol  Is  then  e  sequence  of  data  structure  state 
snapahots  and  operation  applications  (for  a  more  complete  definition  see  ArtMce 
aplapproaches-??7). 

Generic  traces  Like  generic  examples  of  Input /output  pairs,  these  may  also  be 
useful.  In  general,  there  le  a  whole  spectrum  of  trace  specifications  depending  on  how  much 
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imperative  Information  and  descriptive  information  Is  present  in  the  trace.  For  instance,  the 
trace  above  is  completely  descriptive;  traces  that  contain  function  applications  and/or 
sequencing  information  tend  to  be  mo re  imperative. 


Advantages  and  Disadvantages  of  Specification  by  Examples 

As  stated  above,  generic  examples  are  less  ambiguous  than  non-generic  examples. 
Traces  are  less  ambiguous  than  input/output  pairs,  but  the  user  Is  required  to  have  in  mind 
some  idea  of  how  the  desired  program  Is  to  function.  On  the  other  hand,  traces  do  allow 
some  imperative  specification  of  the  flow  of  control. 

Specification  by  examples  can  be  natural  and  easy  for  the  user  to  formulate  (Manna, 
19/7)  Examples  have  the  limitations  Inherent  to  informal  program  specifications:  The  user 
must  choose  examples  so  as  to  unambiguously  specify  the  desired  program.  The  AP  system 
must  be  able  to  determine  when  the  user's  specification  Is  consistent  and  complete  and  that 
the  system's  "model"  of  what  the  user  wants  is  indeed  the  right  program. 


Natural  Languaga  Specifications 

Given  an  appropriate  conceptual  vocabulary,  English  descriptions  of  algorithms  are 
often  the  most  natural  method  of  specification.  Part  of  the  reason  is  that  natural  language 
allows  greater  flexibility  in  dealing  with  basic  concepts  than  do,  say,  very  high-level 
languages.  This  flexibility  requires  a  fairly  sophisticated  representational  structure  for  the 
model,  with  capabilities  tor  representing  the  partial  (incomplete)  and  often  ambiguous 
descriptions  that  users  provide.  In  addition.  It  may  be  necessary  to  maintain  a  database  of 
domain-dependent  knowledge  for  certain  applications  Experience  with  implemented 
systems,  such  as  SAFE  (Balzer,  Goldman,  A  Wile,  1977a;  see  also  03),  suggests  that  the 
relevant  Issues  are  not  In  the  area  of  natural  language  processing  but  in  how  the 
specifications  ara  modeled  in  the  system  and  what  "programming  knowladge”  the  system 
must  have. 


Mixed-Initiative  Natural  Languaga  Oialogua 

More  versatile,  this  specification  method  involves  interaction  between  the  user  and  the 
system  as  the  system  builds  and  tries  to  f Ml  m  the  details  in  its  modal  of  the  algorithm.  In 
addition  to  maintaining  a  modal  of  the  algorithm,  such  systems  sometimes  will  even  maintain  a 
kind  of  model  of  the  user  to  help  the  system  tailor  the  dialogue  to  a  particular  user's 
tdiosyncracies.  Various  techniques  mentioned  previously,  such  as  examples  or  traces,  could 
be  used  in  the  dialogue  as  a  description  of  some  part  of  the  algorithm.  The  system  might  be 
designed  so  as  to  allow  users  to  be  as  vague  or  ambiguous  as  they  please;  the  system  will 
ultimately  ask  them  enough  to  fin  M  the  model. 

This  method  Is  probably  the  closest  to  the  usual  method  of  program  specification  used 
by  people,  allowing  both  the  specifier  and  the  programmer  to  make  comments  and 
suggestions.  Users  do  not  have  to  keep  every  detail  in  mind,  nor  do  they  have  to  present 
them  in  a  certain  order.  The  system  wlii  eventually  question  the  user  for  aliasing  details  or 
ambiguous  specifications.  On  the  other  hand,  this  method  requires  a  system  that  deals  with 


A 


Methods  of  Specification 


16 


many  problems  of  neturel  Isngusge  trenslstlon,  generation,  and  representation 
representation  Is  also  required  for  the  system's  model  of  the  algorithm. 


A 


The  PSI  system  (Green,  1076b;  see  also  DO)  snd  the  NLPQ  system  (Heldorn,  1074;  see 
also  08)  use  this  method  of  program  specification.  Floyd  (10 72),  end  Green  (1077),  give 
hypothetical  dialogues  with  such  s  system,  illustrating  the  problems  that  researchers  have 
encountered  with  this  approach. 


References 

See  Biermann  (1076a)  and  Heldorn  (107  7).  For  examplea  of  Individual  specification 
methods  see  the  remaining  articles  of  this  chapter. 
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B.  Basic  Approaches 

The  following  arc  some  of  the  basic  approaches  used  in  Automatic  Programming  (AP) 
systems  to  synthesize  desired  programs  from  user  specifications.  There  is  not  always  a 
clear  distinction  between  synthesis  and  specification.  Furthermore,  as  will  be  seen  from  the 
later  articles,  some  systems  employ  primarily  one  approach  while  others  employ  more 
elaborate  paradigms  that  use  several  approaches.  (Synthesis  and  specification  are 
introduced  In  the  overview  article.) 


Theorem  Proving 

The  theorem-proving  approach  is  used  for  the  synthesis  of  programs  whose  input  and 
output  conditions  can  be  specified  m  the  formalism  of  the  predicate  calculus.  As  stated  in 
the  section  on  formal  spaclflcations,  the  user  specifies  the  desired  program  for  the  theorem 
pro ver  as  an  assertion  to  be  proved.  This  assertion  usually  takes  the  form  Green  (1969): 

V  si  (  P(sl)  3  3  s 2  Q(s1.s2)  )  . 


where  si  is  one  or  more  input  variables,  s 2  is  one  or  more  output  variables,  P  is  the 
predicate  that  si  is  expected  to  satisfy,  and  Q  is  the  predicate  that  s 2  is  expected  to 
satisfy  after  execution  of  the  desired  program  In  addition  to  the  above  expression,  the 
theorem  prover  must  also  be  given  enough  axioms  to  make  the  above  expression  provable 

From  the  proof  produced  by  the  theorem  prover,  a  program  is  extracted  For  instance, 
certain  constructs  In  the  proof  will  produce  conditional  statements;  others,  sequential 
statements,  and  occurrences  of  Induction  axioms  may  produce  loops  or  recursion.  There  are 
several  variant  methods  of  accomplishing  these  results  (see  Waldmger  &  Levitt.  1924, 
Kowalski.  1924.  Clark  A  Sick  el,  1922) 

Although  any  interesting  example  would  be  far  too  long  to  work  out  in  all  of  its  detail 
here,  it  may  be  worthwhile  to  show  how  such  a  problem  is  set  up.  The  Interested  reader  Is 
referred  to  Green.  1969.  for  a  more  complete  development  of  the  following  example. 
Consider  the  vary  simple  problem  of  sorting  the  dotted  pair  of  two  distinct  numbers.  In  LISP. 
The  axioms  that  would  prove  useful  for  this  synthesis  would  be: 

1 )  x  »  car  (cons(x.y)) 

2)  y  ■  edr  (c  'x.y)) 

3)  x  ■  nil  D  conov^.y.z)  ■  Z 

4)  x  }  ml  a  cond(x.y.z)  ■  y 
6)  Vx.y  (lessp(x.y)  }  nil  ••  x  <  y) 


The  specification  of  the  desired  program,  and  tha  theorem  to  be  proved,  would  be-. 

Vx.  3y.  [car(x)<cdr(x)  o  y»x]  a 
[car(x)zcdr(x)  a  cer(x)«cdr(y)  a  cdr(x)*car(y)]  , 


which  says  that  for  every  dotted  pair  input  x,  thara  is  a  dotted  pair  output  y  such  that  If  x  is 
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already  sorted,  then  y  Is  the  same  as  x.  and  if  x  is  not  sorted,  then  y  is  the  interchange  of 
the  two  elements  of  x.  Using  the  techniques  of  resolution  theorem  proving  (see  Theorem 
Proving-C),  we  would  obtain  the  following  program: 

y«cond(lessp(car(x).cdr(x)).x,cons(cdr(x),car(x)))  . 


in  general,  programs  to  be  synthesized  will  not  be  as  simple  as  the  one  above  One  of 
the  major  problems  that  more  complicated  programs  introduce  is  that  they  require  some  form 
of  Iteration  or  recursion  for  solution.  To  form  a  recursive  program,  ono  needs  the  proper 
Induction  axioms  for  the  problem  A  general  schema  for  the  induction  axiom  sufficient  for 
most  programs  Is  Green  (1969): 


(P(h(nil),nll)  a  V x[  ATOM( x )  a  P(h(cdr(x)),cdr(x))  3  P(h(x),x)]] 
3  V z  (P(h(z).z)]  , 


where  P  Is  any  predicate  and  h  is  any  function  Somehow  this  predicate  and  function 
must  be  determined  Requiring  the  user  to  supply  the  induction  axioms  for  each  program  to  be 
synthesised  somewhat  defeats  the  purpose  of  the  synthesis,  yet  having  the  system 
generate  Induction  axioms  until  ono  of  them  works  takes  up  tar  too  much  time  and  memory. 
Systems  that  determine  the  P  and  h  usually  use  various  heuristics  to  limit  search 

There  are  several  constraints  inherent  to  the  approach  of  thecom  proving  First,  for 
more  complicated  programs,  It  Is  often  more  difficult  to  correctly  specify  programs  In  the 
predicate  calculus  than  It  is  to  write  tho  program  Itself  Second,  the  domain  must  be 
axlomatl/ed  completely,  that  Is,  ono  must  give  enouqh  axioms  to  the  theorem  pro ver  so  that 
any  statement  that  is  true  of  the  various  functions  and  predicates  that  occur  in  the 
specification  of  the  program  can  actually  be  proved  from  the  axioms*-otherwise.  the  theorem 
prover  may  fall  to  produce  a  proof,  and  thereby  fail  to  produce  the  program.  Third,  present 
theorem  provers  lack  the  power  to  produce  proofs  for  the  specification  of  very  complicated 
programs  To  summarize,  the  user  must  fully  and  correctly  specify  the  desired  program,  the 
theorem  prover  must  be  given  enough  axioms  so  that  the  specification  Is  provable,  and  the 
theorem  prover  must  be  stiong  enough  to  prove  the  specification 

It  should  be  noted  that  this  approach  does  not  allow  partial  specification:  Users  cannot 
specify  the  program  partially,  with  the  system  helping  them  to  fill  in  details  On  the  other 
hand,  when  a  theorem  prover  does  succeed  In  producing  a  proof  of  the  specification,  the 
correctness  of  the  extracted  program  Is  guaranteed.  Thus,  AP  systems  might  Incorporate 
theorem  proving  where  It  la  either  convenient  or  where  correctness  Is  an  important  requisite 


Program  Transformation 

The  transformation  approach  is  used  to  automatically  convert  an  easily  written,  easily 
understood  LISP  function  into  a  more  efficient,  but  perhaps  convoluted  program.  One  such 
system,  described  In  Darlington  A  BurstaM  (1973),  performs  recursion  removal,  the  elimination 
of  redundant  computation,  expansion  of  procedure  calls,  and  reuse  of  discarded  list  cells. 
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The  recursion  removal  transforms  a  recursive*  program  into  an  iterative  one,  which  is 
generally  more  efficient,  avoiding  the  overhead  of  the  stacking  mechanism  Candidates  for 
recursion  removal  are  determined  by  pattern  matching  the  parts  of  the  program  against  a 
recursive  schema  input  pattern.  If  the  match  is  successful  and  if  certain  preconditions  are 
met,  then  the  program  Is  replaced  by  an  iterative  schema.  A  simple  example  of  such  a 
transformation  rulo  Is: 

input  pattern:  f(x)  If  a  than  b  else  h(d.f(e)); 
precondition,  h  is  associative,  x  does  not  occur  free  in  h, 
result  pattern:  f(x)  :■*  if  a 

then  result  *•  b 
else  begin 
result  ►  d, 

x  •-  e;  _ 

while  not  a 
do  begin 

result  ►  h(result.d), 

X  -  B 

end. 

result  •-  h(result.b) 

end 

where  a.  b,  d.  e.  f,  and  h  in  the  input  pattern  are  matched  against  arbitrary  expressions 
in  the  candidate  functions  for  example,  the  function. 

FACTORIAl(x)  ::■  lf(  x *  1 )  then  1  else  TlMFS  (x,  FACTORIAL.  (x-1)) 


would  match  the  above  input  pattern  with  f  -  FACTORIAL,  a  -  (x*1).  b  »  1 ,  h  »  TlMFS,  d 
-  x,  and  e  -  (x-1)  The  resulting  program  would  be  the  resulting  pattern  with  these  values 
substituted  for  a,  b,  d,  e.  f,  and  h 

Eliminating  redundant  computations  Includes  traditional  subexpression  elimination  as 
well  as  combining  loops  that  iterate  over  the  same  range  The  latter  Includes  implicit 
iteration.  Thus,  if  A.  B,  and  C  are  represented  as  linked  lists,  the  sequence: 

X  -  INTERSECTION  (A.B) 

Y  ►  INTERSECTION  (A.C)  , 


Is  really  two  implicit  Iterations,  each  over  the  set  A  A  suitable  transformation  rule  would 
convert  these  Into  a  single  Iteration  over  the  set  A. 

Expanding  procedure  calls  generally  Involves  substituting  the  body  of  a  procedure  for 
each  of  the  calls  to  it.  The  potential  benefit  arises  from  simplifications  made  possible  by  use 
of  the  ocal  context.  This  technique  Is  the  starting  point  for  a  general  clasa  of 
transformations  explored  In  Burstall  8  Darlington,  1076,  and  Wegbreit,  1076a. 

Program  transformation  is  also  used  to  convert  very  high-level  specifications  into 
target  language  implementationa  (aee  06.  09.  as  welt  as  summaries  of  these  articles  in  A). 
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Knowledge  Engineering 

AP  systems  are  said  to  be  "knowledge-based"  when  they  are  built  by  Identifying  and 
codifying  the  knowledge  that  Is  appropriate  for  the  program  synthesis  and  understanding 
(I. a.,  ability  to  manipulate  and  analyze  programs)  and  by  embedding  this  knowledge  in  some 
representation  Many  of  these  systems  use  large  amounts  of  many  kinds  of  knowledge  to 
analyze,  modify,  and  debug  large  classes  of  problems  While  the  distinction  Is  relative,  It  Is 
possible  to  divide  this  knowledge  Into  two  types:  programming  knowledge  end  domain 
knowledge 

Programming  knowledge  includes  both  programming  language  knowledge,  which  Is 
knowledge  about  the  semantics  of  the  target  language  in  which  the  system  will  write  the 
desired  program,  and  general  programming  knowledge,  which  Is  knowledge  about  about  such 
things  as  generators,  tests,  Initialization,  loops,  sorting,  searching,  and  hashing  Programming 
knowledge  Includes  (a)  optimisation  techniques,  (b)  high-level  programming  constructs 
(loops,  recursion,  branching),  and  (c)  strategy  and  planning  techniques. 

Domain  knowledge  Is  what  is  necessary  for  a  system  to  infer  how  to  go  from  the 
problem  description  or  specification  of  a  program  In  a  certain  program  class  (for  example 
symbolic  computation)  to  what  needs  to  bo  done  to  solve  the  problem  this  "know-how" 
Includes  how  to  structure  the  concepts  In  the  domain  or  problem  area  and  find 
Interrelationships  among  them  It  must  also  Include  knowledge  about  how  to  achieve  certain 
results  In  the  problem  domain  (cf  .  HACKER'S  learning  of  procedures  Problem  Soiving.BS). 
Moreover.  It  should  be  able  to  define  the  problem  in  alternative  ways  and  find  alternative 
waya  to  solve  the  task--such  knowledge  represents  an  "understanding"  of  the  domain. 

Knowledge-based  systems  need  a  method  of  reasoning.  Since  they  are  not  restricted 
to  using  the  traditional  formalisms  of  logic,  they  often  supply  their  own  flexible  reasoning 
techniques  for  guiding  the  synthesis.  Some  of  these  techniques  include  Inference,  program 
simplification.  Illustration  and  simplification  for  the  user,  decision  trees,  problem-solving 
techniques,  and  refinement 

The  basic  concern  In  representing  the  knowledge  Is  that  the  knowledge  be  structured 
In  such  a  way  that  the  search  for  relevant  facts  not  cause  a  combinatorial  explosion. 
Various  representations  employed  Include: 

--  PLANNER-like  procedural  experts  (Al  Lengungee.C1). 

--  Refinement  rules  (05). 

--  Modular,  frame-like  experts  (OWL  (Martin,  1074) 
and  BEINGS  (lenat,  1076)), 

--  Semantic  nets  (08).  and 

--  Amorphous  systems  that  try  several  ad  hoc  techniques 
((Biggerstaff.  1070)). 


Methods  of  accessing  knowledge  bases  include:  pattern  Invocation  (Article  D5).  "when 
needed"  (Sussman,  1076):  frame  relations  and  assertions.  Including  filling  In  process  models 
(Martin,  1074;  Green,  1000;  Lenat.  1076;  •••  Articles  06.  06.  and  03);  and  subgoal  or  caae 
analysis  (Green,  1077,  and  see  06). 
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Automatic  Data  Selection 

This  approach  refers  to  the  selectton  of  efficient  low-level  data-structure 
implementations  for  a  program  specified  In  terms  of  high-level  abstract  information  structures 
(e  g.,  sets).  Generally,  programming  languages  containing  abstract  data  types  have  default 
representations  that  are  a  compromise  between  all  likely  uses  of  the  structures:  these  data 
types  are  typically  tar  from  efficient  in  any  one  particular  program  But  a  system  with 
automatic  data  selection  would  choose,  from  a  collection  of  possible  implementations,  an 
Implementation  more  efficient  for  the  particular  program  under  consideration  for  example, 
the  abstract  data  type  let  could  be  represented  In  low-level  implementations  as  a  linked  list, 
a  binary  tree,  a  hash  table,  a  bit  string,  or  as  property  list  markings  Various  operations  on 
sets  are  easier  In  one  representation  than  in  another--e  g  .  set  intersection  using  bit  strings 
ts  simply  a  logical  ANO  operation,  while  iteration  over  a  set  Is  oasier  when  it  is  represented 
as  a  linked  list--and  some  representations  may  not  even  be  applicable  in  a  given  case  (e  g  . 
bit  strings  require  that  the  domain  of  set  elements  be  tixcd  and  reasonably  small,  since  one 
bit  position  Is  used  for  each  possible  element)  Also,  some  representations  may  not  permit  all 
needed  operations  (eg,  the  only  way  to  enumerate  the  Items  in  a  set  represented  with 
property  markings  Is  to  enumerate  all  atoms  in  the  system  )  By  tailoring  the  representation  to 
the  particular  programmer's  Intention,  It  is  possible  to  produce  much  better  code 

One  such  system  performing  data-structure  selection  for  the  user  Is  low.  1974.  and 
low.  1978  this  system  handles  simple  programs  written  in  If  AP,  a  sublanguage  of  SAIL.  It 
solects  representations  for  sets,  sequences,  and  relations  trom  the  fixed  library  ot  low-level 
data  structures  available  In  LEAP.  The  selection  is  guided  by  the  goal  of  minimizing  the 
product  of  the  memory  and  time  required  to  execute  the  resulting  program. 

The  system  begins  with  an  information-gathering  phase  that  searches  out  the  relevant 
characteristics  of  the  program's  data  structures,  such  as  their  expected  size,  number,  the 
operations  performed  on  them,  and  their  interactions  Some  of  this  information  Is  obtained  by 
questioning  the  user,  and  some  Is  obtained  by  monitoring  the  actual  execution  of  the  program 
on  typical  data,  using  default  representations  for  each  structure.  Then  the  system  partitions 
into  equivalence  classes  the  variables  whose  values  will  be  of  the  same  type  of  data 
structure.  The  system  employs  a  method  similar  to  hill  climbing  (see  Article  Searcfx Overview ) 
In  order  to  determine  a  good  assignment  ot  data  structures  to  the  equivalence  classes  (i  e., 
the  representations  assigned  to  the  equivalence  classes  are  repeatedly  varied,  one  at  a 
time,  to  see  If  an  improvement  will  result),  for  further  details,  see  the  above  references 

Other  AP  systems  are  elso  concerned  with  the  selection  of  an  efficient  set  ot  date 
structures  or  file  structures,  but  this  concern  Is  part  of  the  general  goal  of  writing  an 
efficient  program  (see  Articles  07  and  09). 


Traditional  Problem  Solving 

Traditional  problem  solving  refers  to  using  goals  to  direct  the  application  of  operations 
In  a  state  space  (see  See rch).  The  Heuristic  Compiler  (Simon,  1972)  regards  the  task  of 
writing  a  program  as  a  problam-soiving  process  using  heuristic  techniques,  hke  those  of  GPS 
(see  Article  SewcHOS).  This  pioneering  work  recognized  the  value  of  both  a  stale  language, 
to  describe  problem  states  and  goals,  and  s  proem  language,  to  represent  the  eolver'a 
actions 
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In  the  Heuristic  Compiler,  the  State  Dtjcnf'tion  Compiler  is  quite  similar  to  later  work  on 
synthesis  from  examples  The  program  being  synthesized  Is  defined  by  specifying 
inpot/output  conditions  on  the  memory  cells  that  it  affects  The  difference  between  the 
current  state  and  the  desired  state  is  looked  up  In  a  table  that  specifies  which  operators  to 
apply  to  transform  the  contents  of  the  cells  appropriately.  The  Functional  [teicription 
Compiler  is  an  important  precursor  to  later  work  in  automatic  modification  and  debugging  of 
programs.  It  uses  a  means-ends  analysis  to  transform  a  known  (compiled)  routine  into  a  new 
(desired)  routine 

HACKER,  a  system  described  by  Sussman  (1976),  adds  to  Simon's  work,  detecting  and 
generalizing  new  differences  (bugs)  and  defining  appropriate  operators  to  resolve  them 
(patches).  This  system  uses  many  significant  Al  techniques  and  language  features  learning 
through  practice  how  to  write  and  debug  programs;  modular,  pattern-invoked  expert 
procedures  (chunks  of  procedural  knowledge);  and  hypothetical  world  models  for  subgoal 
analysis  Sussman's  emphasis  on  generalizing  from  experience  (trying  old  techniques  in  new 
situations),  acceptance  of  the  fact  that  users  have  an  incomplete  understanding  of  the 
desired  program,  and  his  goal-purpose  annotation  technique  are  all  Interesting  directions  In 
the  development  of  Automatic  Programming 

However,  HACKFR's  preference  for  ruthless  generation  of  "buggy"  code  without 
detailed  planning  has  led  to  inadequate  handling  of  subgoal  conflicts  The  user  must 
carefully  schedule  the  training  sequences  and  be  ready  for  the  combinatorial  explosion  as 
the  system  exhaustively  searches  Its  base  of  world  facts  and  programming  knowledge  Such 
systems  must  constrain  the  search  problem  of  large  knowledge  bases.  Other  attempts  to 
distribute  knowledge  among  Interacting  specialists  have  encountered  the  same  difficulty 
(lenat,  1976). 

We  find  that  systems  such  as  HACKER,  which  have  been  designed  to  operate  like 
human  programmers,  promise  a  moderate  degree  of  success  compared  to  knowledge- 
impoverished  formal  methods  However,  these  systems  are  still  often  hampered  by  the  rigid 
formalism  that  governs  their  application:  In  what  order  are  operators  to  be  applied?  How 
can  domain-specific  Information  be  specified  as  differences?  The  formalisms  used  to 
incorporate  the  various  knowledge  sources  In  these  systems  seem  too  methodical;  the 
method  Is  space  and  time  bound  because  it  is  based  on  search 


Induction 

Induction  or  inductive  Inference  refers  to  the  system's  "educated  guess"  at  what  the 
user  wants  on  the  basis  of  program  specifications  that  only  partially  describe  the  program's 
behavior  Such  specifications  are  often  the  examples  of  Input/output  pairs  and  program 
traces,  in  both  regular  and  qenerlc  form  (B).  for  each  of  these  kinds  of  specification,  the 
corresponding  AP  system  must  determine  the  general  rules  on  the  basis  of  a  specification 
that  contains  only  a  few  examples  (or  In  the  generic  specifications,  a  limited  class  of 
examples)  of  the  program  behavior 

The  work  In  program  synthetii  from  specification  ty  examples  had  Its  origin  In  research 
dealing  with  grammatical  inference,  where  the  objective  wes  to  Infer  e  grammar  that 
described  e  lenguege,  given  severel  examples  of  strings  of  the  lenguage  (Feldman,  Gips, 
Horning,  &  Reder,  1909,  and  Biermann  A  Feldman,  1970).  In  a  natural  way.  this  research  wes 
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associated  with  the  inference  of  finite-state  machines  from  the  sequence  (string)  of  states 
that  the  machine  passes  through  during  execution.  The  association  was  natural  since  finite- 
state  machines  are  intimately  related  with  the  grammar  that  generates  the  strings  of  states 
that  represent  legal  behavior  of  the  machine  (Biermann  &  Knshnaswamy,  10/4).  This 
research  was  the  basis  for  two  new  avenues  of  investigation:  synthesis  from  examples  and 
synthesis  from  traces. 

The  crucial  issue  for  program  synthesis  from  examples  is  to  develop  a  generalized 
program,  that  is,  one  that  can  account  for  more  than  the  examples  given  in  the  program 
specification  To  do  this,  these  programs  break  down  the  input,  looking  for  recursively 
solvable  subparts  (Shaw,  Swartout,  &  Green,  10/6)  or  computation  repetitions  that  can  be 
fitted  Into  a  known  program  scheme  (Hardy.  19/6) 

Tho  work  In  program  synthuii  from  tract  sptaficattom  seeks  to  Invert  the  transformations 
observed  in  a  trace  protocol  to  create  abstractions  that  generalize  Into  loops  and  variables 
(Bauer,  10/6).  Of  all  the  Induction-based  synthesis  paradigms,  It  Is  the  one  that  is  closest 
to  grammatical  Inference  Biermann  &  Knshnaswamy  (1974)  has  built  a  system  that 
interprets  traces  as  directions  through  a  developing  flowchart.  Phillips  (10/7)  has 
implemented  a  system  for  the  inference  of  very  high-level  program  descriptions  from  a 
mixture  of  traces  and  example  pairs  in  the  context  of  a  largo  automatic  programming  system 
□3. 


All  inductive  Inference  systems  are  dependent  upon  a  good  axtomatnation  of  optrationt 
In  other  words,  the  system  must  know  about  all  of  the  possible  primitive  operations  that  can 
be  applied  to  the  data  structures  If  it  is  to  hope  to  construct,  by  composition  of  t(jese 
primitives,  the  desired  program  furthermore,  a  harmonious  relation  between  the  nature  of 
tho  constructs  In  the  specification  and  the  most  basic  constructs  In  the  target  lanouane  is 
essential,  for  example,  In  Slklossy  &  Sykes.  19/6,  the  tasks  of  tree  traversal  and  repetitive 
robot  maneuvers  are  directly  translatable  Into  LISP  recursion  Moreover,  these  programs  are 
required  to  know  quite  a  bit  about  generalization  After  synthesizing  the  program,  they  test 
It  on  othor  examples,  sometimes  by  generating  test  cases  and  sometimes  by  asking  the  user 
for  approval  For  certain  classes  of  programs,  examples  and  traces  provide  a  natural  way  for 
the  user  to  specify  what  the  desired  program  is  to  do. 


Induction  For  Input/Output  Pairs 

Tho  synthesis  of  programs  from  a  specification  consisting  of  Instances  of  input/output 
pairs  Is  strongly  related  to  the  problem  domain  to  which  these  programs  belong  (e  g.,  sorting, 
concept  formation)  A  set  of  program  schemata  characterize  the  entire  class  of  programs  for 
the  domain  These  schemata  are  like  program  skeletons  and  define  the  general  structure  of 
a  program,  omitting  some  details  The  synthesis  of  a  program  thus  amounts  to  (a)  selecting 
a  given  schema  that  is  representative  of  the  program  specified  by  the  set  of  example  peirs. 
and  then  (b)  using  the  information  prasent  In  the  examples  to  Instantiate  the  unfilled  slots  of 
the  schema  So.  there  are  two  steps  a  dauification  process,  which  selects  the  general 
structure  (schema)  of  the  terget  program,  and  an  uwfsiiridfion  process,  which  completes  the 
details  of  the  target  program. 

What  does  the  classification  process  require?  Every  schema  defines  e  subclass  of 
programs  tn  the  problem  domain.  Every  set  of  example  pairs  defines  a  family  of  programs  m 
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tha  domain  Thus,  the  classification  process  must  associate  this  set  of  example  pairs  with 
one  of  the  subclasses  of  programs  in  the  domain.  In  order  to  accomplish  this  task,  a  set  of 
characteristics  Is  associated  with  each  schema  (subclass)  that.  If  present  in  the  set  of 
example  pairs,  guarantees  that  the  set  specifies  a  program  of  this  type  Usually  this  task  is 
accomplished  by  (a)  providing  a  set  of  dt/frrrncr  mtosurti  to  be  applied  to  the  inputs  end 
outputs  of  an  example  pair,  as  well  as  to  different  example  pairs  In  the  input  collection  (if  it 
consists  of  more  than  one),  and  (b)  providing  a  set  of  heuristics  for  each  program  schema 
that  determine  a  ft  measure  of  the  example  set  that  accompanies  it  The  task  of  classifying 
the  example  set  Is  then  simply  reduced  to  choosing  the  schema  with  the  highest  fit  value. 

During  the  Instantiation  process.  In  addition  to  the  difference  fit  measures  described 
above,  every  schema  has  an  associated  set  of  rules  for  filling  its  empty  slots  through  the 
extraction  of  necessary  features  from  the  examples.  For  Instance.  In  the  domain  of  list 
manipulation  functions,  cases  where  the  output  list  contains  all  elements  In  the  Input  and 
cases  where  the  output  list  contains  only  every  other  element,  etc.,  suggest  diffn'ont 
methods  of  constructing  the  output  Incrementally  from  the  Input.  In  the  first  c«m  the 
function  maps  down  the  input  list;  in  the  second  case,  it  maps  down  the  input  using  the  LISP 
CODR  function  Slots  are  instantiated  by  these  rules  in  terms  of  primitive  operators  of  the 
domain  and  their  functional  compositions  (In  the  above  case,  the  basic  LISP  functions  an«* 
their  compositions) 

Once  a  schema  has  been  selected  and  instantiated,  the  synthesis  algorithm  must 
vahdatt  its  hypothesis  This  task  is  usually  done  either  by  generating  some  new  examples  lor 
the  program,  evaluating  the  synthesized  program  on  the  example  set,  and  checking  the 
results  with  the  user;  or  by  presenting  the  program  to  the  user  and  letting  him/her  verify  Its 
correctness. 

In  summary,  the  basic  algorithm  !s: 

( T )  Apply  the  difference  measures  to  the  example  set. 

(2)  Besed  on  this  application,  classify  the  set  into  a  particular  schema  class. 

(3)  Using  heuristics  associated  with  the  particular  schema,  hypothesize  a 

complete  Instantiation  of  the  selected  schema. 

(4)  Validate  this  hypothesis. 

In  this  basic  algorithm,  if  there  Is  a  single  I/O  pair  in  the  specification,  the  difference 
measures  are  just  a  set  of  feature-detecting  heuristics.  If  there  la  more  than  one  peir,  the 
pairs  may  be  ordered  according  to  the  complexity  of  the  Input.  Difference  measures  will  fall 
into  two  classes:  those  that  associate  the  structure  of  a  pair  with  a  schema  class,  and 
those  that  find  differences  between  pairs  The  latter  are  perhaps  more  crucial  In  the 
Inference  of  a  program.  From  these  differences,  a  theory  for  the  operation  of  the  program  is 
Inductively  Inferred  or,  what  Is  the  same,  a  formation  rule  is  derived  This  operational  theory 
Might  take  the  form  of  a  certain  schema  class  or  of  a  recurrence  equation  that,  in  turn, 
specifies  a  schema  class.  In  the  classification  phase  it  may  be  necessary  to  apply  the 
classification  rule  to  all  pairs  in  order  to  Infer  the  corresponding  schema  correctly.  When 
several  different  scheaiaa  have  been  Inferred,  a  decision  rule  is  required  to  select  the 
correct  one. 
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An  alternative  approach  Is  to  reduce  the  whole  problem  to  another  paradigm  tor 
synthesizing  programs  For  example,  If  the  problem  domain  has  been  formalized,  so  that  there 
is  a  set  of  operators  for  the  domain,  it  is  possible  to  use  a  traditional  problem  solver  to 
generate  a  solution  to  the  Input/output  pair  (considered  as  initial-state,  goal-state)  in  the 
form  of  a  sequence  of  operators  that  carry  the  Input  Into  the  output.  The  solution  so 
obtained  can  be  considered  a  trace  of  the  program  to  be  synthesized  and  a  trace-based 
paradigm  may  be  employed 

Specification  by  examples  Is  suitable  for  synthesizing  a  program  only  In  those  cases 
whore  the  task  domain  is  small  and  easily  axiomatized.  It  may  also  be  a  feasible  approach  in 
the  case  where  the  domain  Is  repetitious  enough  that  a  small  set  of  pairs  is  sufficient  to 
specify  the  program,  which  Is  almost  never  the  case  in  practical  programming  domains.  Such  a 
specification  method  tends  to  be  quite  limited  and  does  not  lend  itself  to  useful 
generalization  to  large  domains.  Nevertheless,  the  power  of  examples  for  clarifying  concepts 
Is  unquestionable.  It  seems  that  the  main  application  that  this  specification  formalism  will 
have  In  future  automatic  programming  systems  is  restricted  to  the  annotation  and  clarification 
of  more  formal  program  descriptions. 


Induction  From  Traces 

Inferring  a  program  from  a  set  of  traces  is.  as  mentioned  earlier,  very  similar  to  inferring 
a  description  of  a  finite-state  machine  from  a  set  of  sequential  states  that  the  machine 
might  pass  through.  The  basic  approach  for  synthesizing  a  program  from  a  set  of  traces  is  to 
generate,  In  order  of  Increasing  complexity,  the  possible  programs  constructed  from  the 
programming-domain  operators,  tests,  and  their  functional  compositions,  then,  after  each  new 
program  is  generated,  to  validate  the  given  traces  against  the  program  If  the  generated 
program  accounts  for  the  traces,  then  it  is  the  required  solution.  Notice  that  some  kind  of 
complexity  measure  Is  needed  for  the  enumeration,  for  example  program  size  (e  g  ,  number  of 
Instructions  In  the  program). 

This  basic  approach  suffers  from  the  problems  inherent  to  search  in  a  large  search 
space  and  thus  admits  Improvements  In  the  form  of  reduction  of  the  combinatorial  explosion 
by  the  use  of  heuristics  to  prune  and  guide  the  search  process.  It  Is  thus  not  generally 
practical  and  Is  suited  only  to  the  inference  of  small  programs  in  very  simple  domains 
Nevertheless,  It  has  been  applied  with  moderate  success  to  the  inference  of  programs  from 
memory  traces.  Usually  consisting  of  register  assignments,  tests,  and  memory  modification 
instructions,  such  programs  and  their  traces  are  not  very  complex.  Programs  as  complex  as 
Moara's  FIND  algorithm  have  been  synthesized  in  this  manner  (Petry  A  Biermann,  1976). 
Though  these  systems  tend  to  be  knowledge-impoverished.  Phillips  (1977)  exhibits  a 
methodology  to  compensate  for  this  by  utilizing  problem-domain  or  domain-specific  knowledge 
In  the  Inference  process.  There  are  certain  other  special  Inference  paradigms  for  particular 
trace  classes. 

Program  Inference  from  protocols  Usually,  traces  mix  information  about  operations 
applied  fr>  data  objects,  results  of  tests  as  to  whether  predicates  hold  at  certain  points 
during  program  execution,  state  snapshots  of  data  values,  and  other  Information.  Different 
classes  of  traces  arise  If  restrictions  are  placed  on  the  kind  of  information  that  may  appear 
in  them.  Protocols  are  one  such  class,  m  which  only  operation  applications  and  data  structure 
changes  may  appear  and  m  which  thera  Is  no  Information  about  control  decisions  that  have 


B 


Basic  Approach*! 


26 


been  taken  during  the  particular  program  execution  reflected  In  the  trace.  An  example  of  a 
typical  protocol  for  a  function  that  reverses  a  list  would  be: 

Input  X 
X  ■  (A  B  C) 

Y  ■  (A) 

X  ■  (B  C) 

Y  «  (B  A) 

X  ■  (C) 

Y  ■  (C  B  A) 
output  Y 

Notice  that  the  only  Information  present  In  the  protocol  Is  operation  applications  and  variable 
atate  changes.  All  control  information  is  omitted. 

The  Inference  of  a  program  from  a  collection  of  protocols  Involves  two  phases:  (a) 
constructing  a  program  description  that  captures  the  nature  of  a  program  and  which  could 
have  generated  a  subset  of  the  Input  protocols,  and  modifying  the  program  description;  and 
(b)  modifying  the  program  description  as  more  protocols  become  available  In  order  to 
velldate  them. 

A  natural  algorithm  would  then  be  to  hypothesize,  by  some  feature  classification 
process  or  with  the  aid  of  a  domain  knowledge  base,  an  Initial  description  and  then  debug  It 
by  forcing  a  unification  of  the  protocol  family  with  th*  description.  The  construction  of  the 
Initial  program  description  can  be  described  as  follows: 

(1)  Match  the  protocols,  that  is,  find  common  segments  as  well  as  differences  by 
matching  their  structure. 

(2)  Find  substitutions  that  unify  these  protocols.  Protocols  may  differ  in  variables 
that  have  different  names,  In  th*  same  data  objects  (at  the  same  place  In 
the  protocols)  having  different  values,  and  in  differences  In  the  operations 
that  occur.  The  matching  phase  produces  a  set  of  such  differences.  The 
substitution  phase  finds  substitutions  that  remove  these  differences.  For 
example,  if  two  protocols  refer  with  different  variable  names  to  the  same 
data  object,  this  phase  would  propose  a  common  name  for  the  two  variables. 

Such  substitutions  usually  take  the  form  constant  ->  variable  or  variable- 
name  ->  variable-name. 

(3)  Inductively  form  loops  by  detecting  repeated  equivalent  subprotocols.  Loop 
formation  Is  the  basic  Inductive  step  of  this  approach. 

For  example, 

protocol  string  >ABCOAICD 
hypothesized  loop: 
while  < condition) 
do  begin 
A; 

B; 

C; 

0; 

end; 
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Since  there  are  infinitely  many  loop  hypotheses  tor  a  given  protocol,  one  of 
the  tasks  of  the  system  designer  is  to  provide  a  good  aet  of  heuristics  to 
guide  the  search  process  during  loop  formation.  For  example,  one  such 
possible  heuristic  could  be  to  consider  first  the  loops  with  minimal  nesting 
level. 

(4)  Generalize  remaining  constants  to  variables. 


At  this  stage,  then,  a  description  has  been  generated  where  all  data  object  snapshots 
have  an  associated  variable  name,  and  where  loop  structures  in  the  program  have  been 
inferred.  The  result  of  this  matching,  unification,  and  abstraction  (generalization)  process  Is 
a  semantic  net  representation  of  the  program. 

The  next  stage  is  to  verify  that  the  hypothesized  program  description  agrees  with  any 
additional  protocols,  and  if  this  is  not  the  case,  to  modify  it.  This  correction  (debugging) 
phase  can  be  described  as  follows: 

(1)  Try  to  validate  new  protocols  against  the  program  representation- -i.e.  to 

symbolically  execute  the  program  description  to  see  if  It  can  account  for 
the  given  protocol. 

(2)  Find  any  differences  between  predicted  and  actual  protocol  The  symbolic 

evaluation  process  generates  a  set  of  differences  that  are  due  to  the 
protocol's  not  matching  the  program  description  This  set  of  differences 
suggests  the  kinds  of  modifications  that  must  be  done  to  the  description. 

(3)  Form  a  theory  for  the  difference.  That  Is.  hypothesize  a  suitable  change  to 

the  program  description,  which  removes  the  particular  difference.  One  way 
of  accomplishing  this  result  is  to  use  a  classification  process  similar  to  the 
basic  algorithm  for  Inference  from  exampiea. 

(4)  Modify  program  representation  accordingly. 

This  synthesis  paradigm  works  only  for  complete  protocols,  that  is.  protocols  where  all 
data  structure  changes  appear  explicitly.  Phillips  (IflTT)  has  proposed  a  procedure  for 
handling  Incomplete  protocols  In  a  unified  framework  for  synthesis  from  examples  and 
synthesis  from  traces  or  protocols.  This  procedure  is  basically  as  follows:  For  those 
segments  of  a  protocol  where  operations  are  missing,  that  is  where  two  states  of  a  data 
structure  appear  without  Intervening  operations,  the  exasiples  component  of  the  system 
infers  a  piece  of  program  description  (I.e.,  a  sequence  of  operations)  that  can  take  the  data 
object  from  one  state  to  the  other.  This  program  description  is  nothing  but  the  sequence  of 
missing  operation  applications.  Merging  aN  such  sequences  with  the  original  Incomplete 
protocol,  transforms  It  Into  a  complete  protocol,  and  the  above  algorithm  for  dealing  with 
complete  protocols  can  be  used. 

Problem-solver  generated  traces  If  the  domain  Is  fuDy  axiomatized.  as  may  be  the 
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case  for  simple  domains  like  those  for  robots,  it  may  be  possible  to  synthesize  program*,  from 
example  pairs  by  using  a  problem  solver  that  produces  a  solution  to  the  input  pair  in  the  form 
of  a  trace. 

( 1 )  Synthesize  trace  from  example  pair  via  problem  solver. 

(2)  Using  the  trace,  a  set  of  program  schemas  for  the  domain,  and  a  set  of 

schema  selection  and  instantiation  heuristics  that  operate  on  trace  steps, 
produce  a  program  in  terms  of  domain  operators  and  domain  predicates  that 
explain  the  example  pair. 

All  these  paradiqms  work  only  for  complete  traces  and  protocols  The  problem  of 
program  inference  from  incomplete  specifications  Is  still  under  investigation.  It  is  possible 
that  the  techniques  outlined  may  be  extended  to  cover  the  Incomplete  case  by  coupling  the 
program  synthesizer  to  a  domain-based  theory  formation  module  that  could,  so  to  speak,  "fill 
in"  the  missing  elements  from  the  original  specification.  At  this  point,  then,  the  methodology 
discussed  above  could  be  used 

Traces  have  the  limitations  inherent  to  informal  program  specifications,  namely,  the 
difficulty  of  specifying  the  required  program  uniquely  with  respect  to  the  limited  amount  of 
Information  conveyed  to  the  synthesizer.  Thus,  the  problem  of  choosing  a  good  description  is 
left,  as  a  burden,  to  the  user.  This  problem  might  be  alleviated  by  the  use  of  greater  domain 
expertise--to  produce  the  program  that  more  nea'v  resembles  the  user's  desired  result. 

Traces,  and  informal  specification  methods,  will  be  useful  for  algorithm  description  and 
correction  in  future  automatic  programming  systems  Clearly,  the  reason  for  this  is  that  these 
methods  closely  reflect  the  form  in  which  we  humans  understand  and  describe  programs. 
Current  applications  include  the  synthesis  of  calculator-like  programs  from  memory-register 
traces  (Biermenn  &  Krtshnaswamy,  19  74) 
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C.  PS  I 

The  PSI  system  is  being  developed  by  Cordell  Green  and  his  colleagues  at  Systems 
Control,  Inc.,  and  at  Stanford;  people  who  contributed  ideas  and  actually  worked  on  the 
projoct  Include  David  Barstow,  Avra  Cohn.  Richard  P.  Gabriel.  Jerold  Gmsparg.  Flame  Kant. 
Beverly  I  Kedzierskl,  Juan  Ludlow,  Bruce  Nelson,  Tom  Pressburger,  Jorge  V  Phillips.  Louis 
Steinberg.  Steve  T.  Tappel,  Ronny  Van  Den  Heuval.  and  Stephen  J.  Westfold  The  goal  of 
the  system  Is  the  Integration  of  the  more  specialized  methods  of  automatic  programming  into 
a  total  system.  This  system  then  would  incorporate  specification  by  examples,  by  traces,  or 
by  interactive  natural  language  dialogue;  knowledge  engineering;  model  acquisition;  program 
synthesis,  and  efficiency  analysis.  Research  objectives  include  the  organization  of  such  a 
system,  the  determination  of  the  amount  and  type  of  knowledge  such  a  system  would  require, 
and  the  representation  of  this  knowledge. 

The  proqram  Is  specified  by  means  of  an  Interactive,  mixed-initiative  dialogue,  which 
may  Include  as  a  subpart  the  specification  by  example  of  a  trace.  Plans  are  also  underway 
to  add  specification  by  means  of  a  loose,  very  high-level  language.  The  different 
specification  methods  can  usually  be  intermixed. 

When  the  specification  is  interactive  natural  language  dialogue,  the  user  furnishes  both 
a  description  of  what  the  desired  program  Is  to  do  and  an  indication  of  the  overall  control 
structure  of  the  program. 


The  problem  area  of  PSI  is  symbolic  computation,  including  list  processing,  searching 
and  sorting,  data  storage  and  retrieval,  and  concept  formation 

The  overall  operation  of  the  system,  illustrated  in  Figure  t,  may  be  divided  into  two 
phases  acquisition  of  a  description  of  the  program,  and  synthesis  of  the  program.  During  the 
acquisition  phase,  several  modules  of  the  system--lncluding  the  parser/interpreter, 
example/trace,  explainer,  and  moderator--will  jointly  interact  with  the  user  to  obtain  and 
construct  a  net.  called  the  program  net,  that  describes  the  desired  program  Then  the 
program  model-builder  module  converts  the  net  into  a  complete  and  consistent  description  of 
the  program  Afterwards,  during  the  synthesis  phase,  the  coding  and  efficiency  modules, 
interacting  with  each  other,  convert  the  program  model,  through  the  use  of  repeated 
transformations,  into  en  efficient  program  written  in  the  target  language. 
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USER 


ENGLISH  LOOSE,  VERY  HIGH-LEVEL  INPUT-OUTPUT  PAIRS 

SENTENCES  LANGUAGE  STATEMENTS  AND  TRACES 


PROGRAM  MODEL 


Coder 


Efficiency  expert 
HIGH-LEVEL  LANGUAGE  PROGRAM 

Convent ionel 
comp  1 ler 

MACHINE  LANGUAGE  PROGRAM 


Figure  1:  Major  paths  of  Information  flow  in  PSI 

There  were  three  reasons  for  separating  the  operation  Into  acquisition  and  synthesis 
phases.  First,  the  problems  of  designing  such  a  system  are  more  tractable  because  of  the 
separation.  Second,  It  was  envisioned  that  code  generators  for  different  target  languages 
and  domain  experts  for  different  problem  areas  could  be  implemented  to  result  In  a  versatile 
modular  system.  Third,  acquisition  requires  interaction  with  the  user,  whereas,  in  PSI. 
synthesis  does  not. 

In  the  overall  operation,  two  of  the  primary  Interfaces  within  the  PSI  system  are  the 
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program  net  and  the  program  model.  Both  are  very  high-level  program  and  data  structure 
description  languages,  the  program  net  forms  a  looser  description  of  the  program  than  does 
the  program  model.  Fragments  of  the  program  net  can  be  accessed  in  the  order  of 
occurrence  in  the  dialogue,  rather  than  in  execution  order,  which  allows  a  less  detailed,  local, 
and  partial  specification  of  the  program  Since  these  fragments  correspond  rather  closely  to 
what  the  user  says,  they  ease  the  burden  of  the  parser/interpreter  as  well  as  the 
example/trace  inference  module.  As  opposed  to  the  program  net.  the  program  model  includes 
complete,  consistent,  and  interpretable  very  high-level  algorithmic  and  information  structures. 
Further  description  of  the  program  model  occurs  in  the  section  below  on  the  program  model 
builder 

The  remainder  of  this  article  briefly  describes  the  PSI  modules,  presents  the  status  of 
PSI,  and  then  describes  several  examples  (Figures  2  through  6)  from  the  acquisition  phase 
The  latter  Includes  a  specification  by  interactive  natural  language  dialogue,  the  resulting 
program  net  and  model,  and  a  specification  by  trace. 


Experts 

PSI  Is  a  knowledge-based  system  organized  as  a  set  of  closely  interacting  modules, 
also  called  experts  These  experts  include: 

parser/interpreter  expert,  explainer  expert, 

dialogue-moderator  expert, 

applications  domain  expert,  example/trace  inference  expert, 
program  model-building  expert,  coding  expert,  and  the 
algorithm  analysis  and  efficiency  experts 


Parser/Interpreter 

In  the  acquisition  phase,  the  parser/interpreter  expert  (Ginsparg,  1078)  first  parses 
sentences  and  then  interprets  these  parses  into  less  linguistic  and  more  program-oriented 
terms,  which  are  then  stored  In  the  program  net  This  expert  efficiently  handles  a  very  large 
English  grammar  and  has  knowledge  about  data  structures  (e.g.,  sets,  records),  control 
structures  (e  g  .  loops,  conditionals,  procedures),  and  more  complicated  algorithm  ideas  (e  g., 
interchanges  between  the  user  and  the  desired  program,  aet  construction,  quantification). 
The  parser/interpreter  can  sometimes  assign  a  concept  to  an  unknown  word  on  the  basis  of 
the  context  in  which  the  word  appears. 


Dialogue  Moderator  Expert 

This  expert  (Steinberg.  1076)  models  the  user,  the  dialogue,  and  the  state  of  the 
system  and  selects  appropriate  questions  and  statements  to  present  to  the  user.  It  also 
determines  whether  the  user  or  the  expert  has  the  initiative,  and  at  what  level  on  what 
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subject,  and  attempts  to  keep  PSI  and  the  user  In  agreement  on  the  current  topic.  It 
provides  review  and  preview  when  the  topic  changes.  This  expert  decides  which  of  the 
many  questions  being  asked  by  the  other  experts  should  be  passed  on  to  the  user.  Since 
experts  phrase  questions  in  an  internal  form  based  on  relations,  the  dialogue-moderator 
expert  gives  questions  to  the  explainer  expert  which,  in  turn,  converts  them  into  English  and 
gives  them  to  the  user 


Explainer  Expert 

The  explainer  expert,  developed  by  Richard  Gabriel,  phrases  questions  in  terms  that 
the  user  finds  meaningful  (i.e.,  In  terms  related  to  the  problem  domain  and  the  previous 
sentences  In  the  dialogue),  rather  than  using  the  more  programming-oriented  terms  used  in 
the  program  net  or  by  the  model  builder.  For  example,  rather  than  asking  for  the  definition  of 
"A0018."  PSI  asks  what  does  it  mean  for  “a  scene  to  fit  a  concept."  The  explainer  also 
generates  English  descriptions  of  the  net. 

Example/Trace  Expert 

PSI  also  allows  specification  by  traces  and  examples,  since  these  are  useful  for 
Inferring  data  structures  and  simple  spatial  transformations  This  expert  Phillips  (1977) 
handles  simple  loop  and  data  structure  inference  and  uses  several  of  the  techniques 
discussed  in  in  the  last  three  articles  The  final  section  of  this  article  illustrates  how  the  PSI 
user  can  specify  part  of  a  program  using  traces 

Domain  Expert 

The  domain  expert,  developed  by  Jorge  Phillips,  uses  knowledge  of  the  application  area 
to  help  the  parser/interpreter  and  example/trace  experts  fill  in  missing  information  in  the 
program  net. 


Model  Builder 

The  program  model-building  expert  McCune  (1977)  applies  knowledge  of  what 
constitutes  a  correct  program  to  the  conversion  of  the  program  net  into  a  complete  and 
cons  stent  program  model,  which  then  will  be  transformed  during  the  synthesis  phase  into  the 
target  language  implementation.  The  model-building  expert  completes  the  model  by  filling  in 
the  various  pieces  of  required  information  and  by  analyzing  the  model  for  consistency;  it 
checks  to  see  that  Its  parts  are  legal  both  with  respect  to  each  other  and  with  respect  to 
the  semantics  of  the  program-modeling  language.  Information  Is  filled  in  either  by  default,  by 
inference  mechanisms  (which  are  in  the  form  of  rules  and  which  make  use  of  consistency 
requirements),  or  by  queries  to  other  experts,  which  may  eventually  result  in  a  query  to  the 
user.  As  an  example,  suppose  that  the  program  net  contains  "x  part  of  y"  and  that  the  model 
builder  needs  to  fill  in  whether  "part  of"  is  to  mean  set  membership,  subset  inclusion, 
component  of  y.  the  image  of  x  under  some  correspondence  relation  with  y,  or  whether  there 
might  be  an  unspecified  intervening  subpart.  Such  Information  may  be  deducible  from  the 
structures  of  x  and  y,  If  these  structures  are  known  or  when  they  become  known 
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The  model  builder  also  corrects  minor  inconsistencies,  adds  cross-references,  and 
generalizes  parts  of  the  program  description  so  that  the  synthesis  phase  has  more  freedom 
In  looking  for  a  good  implementation  Thus,  if  the  program  net  specifies  that  a  certain  object 
is  to  be  a  set  of  ordered  pairs,  the  piogrem  model  may.  If  appropriate.  Indicate  that  the 
object  is  to  be  a  correspondence  (l.e..  a  functional  mapping). 


Coding  and  Efficiency  Experts 

These  two  experts  are  responsible  for  the  synthesis  phase  The  coding  expert's 
knowledge  base  contains  rules  that  transform  parts  of  a  program  description  to  forms  closer 
to  the  target  language.  It  Is  the  goal  of  the  efficiency  expert  to  guide  the  choice  of  the 
different  rules  so  that  an  efficient  target  language  implementation  eventually  results.  These 
two  experts,  also  known  as  the  systems  LIBRA  and  PECOS,  are  discussed  more  explicitly  in 
articles  devoted  to  their  description.  See  the  articles  0B05 


Status 

The  entire  PSI  system  can  now  be  used  by  a  knowledgeable  user.  PSI  has 
successfully  produced  LISP  code  that  Implements  the  specifications  given  In  English 
dialogues  for  four  variants  of  CLASS,  a  simple  pattern  classification  program.  In  addition, 
othor  tasks  are  handled  by  individual  experts  in  a  stand-alone  mode;  see.  for  example,  D5 
□9 


The  next  section  concludes  this  article  by  describing  examples  of  PSI  at  work. 


Examples 

r igure  2  illustrates  an  interactive  natural  language  dialogue  held  with  the  PSI  system  in 
which  the  user  specifies  a  simple  program,  to  be  called  CLASS.  CLASS  includes  many 
programming  constructs  necessary  for  more  complex  programs 
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What  is  the  name  of  the  program  you  want  to  write? 

••  CLASS. 

Does  CLASS  have  a  special  program  type? 
aa  Classification. 

Please  describe  CLASS. 

aa  CLASS  first  inputs  a  concept.  Then  it  repeatedly  accepts  an  input 
,rom  ***•  u**r'  classifies  it  based  on  whether  or  not  It  fits  the 
•■concept,  and  outputs  this  classification  to  the  user. 

What  is  tho  structure  of  the  concept? 

**  A  concept  Is  a  collection  of  relations.  Each  relation  is  just  a  string. 
What  Is  the  exit  test  of  the  loop? 

aa  Stop  whan  tho  user  types  "QUIT"  instead  of  a  sceno. 

What  is  the  structure  of  the  scene? 
aa  The  same  as  tho  concept. 

Please  define  what  it  means  for  a  scene  to  fit  a  concept. 

A  scene  fits  a  concept  if  all  the  relations  in  the  concept  are 
aa  In  the  scene. 

What  is  the  format  of  the  classification  output? 
aa  Output  "FIT"  If  the  scene  fit,  "DIOS' T  FIT"  If  It  didn't. 

Are  wo  finished? 

aa  Yaa. 


I  iguro  ?  Specification  by  interactive  natural  language  dialogue  CLASS 
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the  user  intends  that  the  CLASS  program  input  a  sample  set  of  items  (e  g  ,  a  list  of  job 
qualifications)  and  then  repeatedly  input  a  trial  set  (e  g  .  the  qualifications  of  an  applicant), 
oach  time  testing  whether  the  sample  set  (required  qualifications)  is  a  subset  of  the  trial  set 
(applicant  qualifications)  and  printing  “FIT"  or  "DIDN'T  FIT,"  accordingly.  The  user  further 
intends  that  a  person  will  be  able  to  terminate  CLASS  simply  by  typing  the  word  "QUIT," 
instead  of  a  trial  set 

Based  upon  its  understanding  of  the  dialogue,  the  parser/interpreter  expert  produces 
the  program  net.  which  is  summarized  in  figure  3  (the  algorithmic  part  of  the  net  Is  shown  in 
an  ALGOL -like  notation).  Then  the  program  model-building  expert  creates  the  very  high-level 
complete  and  consistent  model  of  Figure  4  After  repeated  application  of  transformation 
ruios  during  tho  synthesis  phase,  the  coding  and  efficiency  experts  wiU  convert  this  model 
into  an  efficient  target  language  Implementation. 

A 2  is  either  a  set  whose  generic  element  is  a  string  or  a  string  whose 
value  is  "QUIT" 

A1  is  a  set  whose  generic  element  is  a  string. 

A4  is  the  generic  element  of  At. 

A3  is  oither  TRUE  or  FALSE 

HI  is  a  variable  bound  to  A 2. 

B 2  is  a  variable  bound  to  A1. 

B3  is  a  variable  bound  to  A4 


Cl  ASS 

PRINT( "Ready  for  the  CONCEPT") 

A  1  -  Rf  AD( ) 

LOOP  1 

PRINT( "Ready  for  the  SCENE") 

A 2  -  Rt  A0( ) 

IF  EQUAL(A2,"QUIT")  THEN  GO  TO  EXIT  1 
A3  -  F I T ( A2.A  1 ) 

CASES  IF  A3  THEN  PRINTCFIT") 

ELSE  IF  NOT(A3)  THEN  PRINTCDIDN'T  FIT") 

GO  TO  LOOP  1 

EXIT1 : 

FIT(BI.BP) 

FOR  ALL  83  IMPL IE S( ME MBE R( B3 .82 ),ME MBE R(B3,B  1 )) 

Figure  3  Summary  of  the  program  net. 
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program  CLASS. 
typa 

a00)2  :  sat  of  string  , 

aOOl)  :  alternative  of  [<strlng  •  >"QUIT"  ,000.12]; 

vara 

aOOll  .  a00l4  .  aOOll  .  aOO) 6  :  a0012  . 
aOOll  .  i nOOSO  : 0005  J  . 
m009>  :  atring  ■  "DIDN'T  FIT"  , 
m0092  :  atring  »  "FIT*  , 

"i 009)  :  Boolean  , 
mOOSI  :  atring  ■  "QUIT"  ; 

procedure  a0067{a00)6  .  aOO) 1  :  a00)2)  :  Boolean  ; 
aWI)  K  00076  ; 

procedure  a006 HaOOll  :  aOOll)  :  Boolean  ; 

00051  «  "QUIT"  ; 

begin 

aOOll  -  inf>ul(a0012  ,  uaer  .  "READY  FOR  CONCEPT"  . 
"Illegal  input.  Input  again:  ")  , 
until  AOOll 

repeat 

begin 

mOOSO  -  inf>ut{a001  1  .  uaer  ,  "READY"  ,  "Illegal  input. 

Input  again:  *); 

if  a006l(m00S0)  then  aunt  nat  condition!  AOO  1!)  ; 
000  J<  »  mOOSO  ; 
mOOOl  -  a0067(a00l4  .  aOOll)  ; 

caae 

A  mOO*l  :  inform  uitrCO IDN'T  FIT")  ; 

"1009)  :  inform  uj/r("FlT")  ; 

endcaae 

end 
finally 
AOOll  : 

endloop 
end  ; 


Figure  4  The  program  model 

Tracea  are  another  method  of  apeclficatton  allowed  by  the  PSI  system  Figure  6  shows 
the  use  of  a  trace  to  specify  part  of  the  behavior  of  a  program  celled  TF  ("Theory 
Formation")  A  simplified  version  of  Pat  Winston's  concept  formation  prngram.(Wins1on.  1075). 
TF  builds  and  updates  an  Internal  model  of  a  concept.  A  concept  is  a  collection  of  "may"  and 
"must"  conditions.  TF  builds  and  updates  the  model  by  repeatedly  reading  In  a  scene, 
guessing  whether  the  scene  la  an  Instance  of  the  concept,  verifying  with  the  person  using 
TF  whether  the  guess  was  correct  or  Incorrect,  and  updating  the  model  of  the  concept 
accordingly.  The  trace  In  Figure  6  shows  the  specification  for  only  a  part  of  the  behavior  of 
TF,  the  part  that  describes  how  TF  la  to  update  the  aiodel,  given  that  a  scene  does  or  does 
not  fit  a  concept.  The  other  porta  of  TF  can  be  specified  by  trace  or  by  interactive  natural 
language  dialogue. 


36 


Automatic  Programming 


Concept: 

Scene: 

Result  of  fit: 
Updated  concept: 

Concept: 

Scene: 

Result  of  fit: 
Updated  concept: 
Concept: 

Scene: 

Result  of  fit: 
Updated  concept: 


U 

[(block  aXblock  b)(on  a  b)] 

True 

[((block  a)  may )( (block  b)  may)((on  a  b)  may)] 

[((block  a)  may)((block  b)  may)((on  a  b)  may)] 
[(block  aXblock  b)] 

False 

[((block  a)  may )( (block  b)  may)((on  a  b)  must)] 
[((block  a)  mey)((block  b)  may)((on  a  b)  must)] 
[(block  aXblock  bXblock  cXon  a  b)] 

True 

[((block  a)  may)((block  b)  mayX(block  c)  may) 
((on  a  b)  must)] 

f igure  6  A  specification  by  trace. 


From  this  specification,  the  example/trace  inference  expert  generates  the  following 
information  about  the  desired  program:  If  the  scene  fits  the  concept,  then  add  all  relations  in 
the  scene  but  not  present  m  the  concept  to  the  concept  and  mark  them  with  "may." 
Otherwise,  If  the  scene  doesn't  fit  the  concept,  then  change  the  marking  of  aM  relations 
marked  "may"  In  the  concept  and  not  appearing  m  the  scene  from  "may"  to  "must." 


References 

See  Barstow  (18 77a),  Barstow  ( 1877b),  Barstow  (1877c).  Barstow  A  Kant  (1877), 
Gmsparg  (1878),  Green  (1876a).  Green  (1875b),  Green  (1876a),  Green  (1876b),  Green 
(1976c),  Green  (1877),  Green  (1870),  Green  A  Barstow  ( 1 876).  Green  A  Barstow  (1877a), 
Green  A  Barstow  (1877b),  Green  A  Barstow  (1876),  Kant  (1877),  Kant  (1876),  McCune 
( 1  9  7  7).  Phillips  (1877),  and  Shaw,  Swartout,  A  Green  (1876). 
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D.  SAFE 

The  SAFE  system,  developed  at  USC  Information  Sciences  Institute  by  Robert  Balder, 
Neil  Goldman,  David  Wile,  and  Chuck  Williams  (with  the  recent  addition  of  Lee  Erman  and  Phil 
London),  accepts  a  program  specification  consisting  of  pre-parsed  English,  with  limited 
syntax  and  vocabulary,  Including  terms  from  the  problem  domain  Ihe  phrases  and  sentences 
of  this  specification,  however,  may  be  ambiguous  and  may  fail  to  explicitly  provide  all  the 
Information  required  in  a  formal  program  specification  Therefore,  using  a  large  number  of 
built-in  constraints  (that  must  be  satisfied  by  any  well-formed  program),  any  specified 
constraints  on  the  problem  domain,  and  an  occasional  interaction  with  the  user,  SAFE  resolves 
ambiguities,  fills  In  missing  pieces  of  information,  and  produces  a  high-level,  complete  program 
specification.  To  decide  on  missing  pieces  of  Information,  SAFE  uses  a  variety  of  techniques, 
including  backtracking  (see  article  Al  LenguBQ**)  and  a  form  of  symbolic  execution. 

The  SAFE  system  views  the  task  of  Automatic  Programming  as  the  production  of  a 
program  from  a  description  of  the  desired  txAavior  of  that  program.  There  are  four  major 
differences  between  a  conventionally  specified  program  and  a  program  described  in  terms  of 
its  desired  behavior. 

Informality:  The  behavioral  description  is  informal  It  contains  ambiguity 

(alternative  Interpretations  yielding  distinct  behaviors)  and  "partial" 
constructs  (constructs  missing  pieces  of  Information  that  must  be  supplied 
before  any  interpretation  is  possible).  A  conventionally  specified  program,  on 
the  other  hand,  Is  formal;  its  meaning  is  completely  and  unambiguously  defined 
by  the  semantics  of  the  programming  language 

Vocabulary  The  primitive  terms  used  in  the  behavioral  description  are  those  of 
the  problem  domain.  General-purpose  programming  languages,  on  the  other 
hand,  provide  a  primitive  vocabulary  that  is  significantly  more  independent  of 
particular  problem  areas. 

Executability:  Informality  aside,  it  is  possible,  and  sometimes  desirable,  to 
describe  behavior  in  terms  of  relationships  between  desired  and  achieved 
states  of  a  process  ,  rather  than  by  rules  that  specify  how  to  obtain  the 
desired  state.  Conventionally  specified  programs  must  specify  an  algorithm 
for  reaching  the  desired  state. 

Efficiency:  Conventionally  specified  programs  contain  many  details  of  operation 
beyond  the  desired  input/output  behavior.  Among  these  are  data 
representation,  internal  communication  protocols,  store-recompute  decisions, 
etc.,  that  affect  a  program's  efficiency  (utilization  of  computer  resources  and 
time).  In  general,  these  details  should  not  appear  in  the  description  of 
Input/output  behavior. 


When  one  writes  a  program  In  the  conventional  manner,  one  must  formalize  the 
behavioral  specification,  translate  the  terms  of  the  problem  domain  into  those  of  a  general 
programming  language,  guarantee  that  the  specified  algorithms  actually  achieve  the  desired 
results,  and  make  a  myriad  of  decisions  for  the  sake  of  an  efficient  implementation. 
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The  ISI  group  has  attempted  to  split  the  task  of  creating  a  program  into  two  separate 
parts  by  designing  a  formal,  complete  specification  language  (Balzer  &  Goldman.  1070)  that 
allows  behavioral  specifications  to  be  stated  in  terms  specific  to  the  problem  domain  while 
avoiding  efficiency  and  representational  concerns  This  formal  specification  language  acts  as 
an  interface  between  two  projects  that  deal  respectively  with  the  first  Issue,  translation 
from  informal  to  formal  specifications,  and  the  last  issue,  optimization  of  a  formal 
specification.  The  former  project  is  the  subject  of  this  article,  while  the  latter  is  described 
elsewhere  (Balzer,  Goldman,  &  Wile,  1070)  The  other  issues,  domain-specific  vocabulary 
and  executabillty,  are  addressed  within  the  formal  specification  language. 

The  SAFE  project  has  concentrated  on  only  the  first  of  the  above  specification  issues: 
automatically  producing  a  formal  description  from  an  informal  description.  It  is  not,  therefore, 
a  complete  automatic  programming  system  The  user  of  the  SAFE  system  provides  a 
behavioral  description  in  a  pre-parsed.  limited  subset  of  English,  including  terms  from  the 
problem  area  SAFE  then  seeks  to  determine  a  way  of  resolving  all  ambiguities  and  of  filling  in 
all  missing  Information  in  a  way  that  satisfies  SAFE'S  knowledge  of  the  constraints  that  all 
programs  must  satisfy  The  result  is  a  complete,  unambiguous,  very  high-level  program 
specification  in  a  language  called  A P2 


Partial  Descriptions 

After  studying  many  examples  of  program  specifications  written  in  English,  the  SAFE 
research  group  concluded  that  the  main  semantic  difference  between  these  specifications 
and  their  formal  equivalent  is  that  partial  descriptions  rather  than  complete  descriptions 
were  used  When  such  partial  descriptions  were  used,  It  was  because  the  missing 
information  could  be  determined  from  the  surrounding  context.  These  partial  descriptions 
possess  some  of  the  useful  properties  of  natural  language  specifications  that  are  lacking  in 
formal  languages  They  focus  both  the  writer's  and  reader's  attention  on  the  relevant  issues 
and  condense  the  specification  Furthermore,  the  extensive  use  of  context  almost  totally 
eliminates  bookkeeping  operations  from  the  natural  language  specification. 

A  partial  description  may  have  zero  or  one  or  more  valid  interpretations  in  a  given 
context  If  a  single  valid  interpretation  Is  found  for  a  description,  it  is  unambiguous  in  that 
context  Multiple  valid  Interpretations  indicate  that  there  is  not  sufficient  Information  from 
the  context  to  complete  the  description  and  that  interaction  with  the  user  Is  required  to 
resolve  the  ambiguity.  If  a  partial  description  possesses  no  valid  interpretation,  it  is 
inconsistent  within  the  existing  context. 

The  SAFE  system  incorporates  the  moat  prevalent  forms  of  partial  descriptions  found  in 
natural  language  specifications: 

Partial  sequencing:  Operations  are  not  always  described  in  the  order  of 
execution.  While  sequencing  may  sometimes  be  described  explicitly,  it  is 
frequently  implicit  in  the  relationships  between  operations.  Example:  "Output 
generated  while  compiling  is  sent  to  a  scratch  file.  This  file  must  be  opened  in 
wriit  only  mode,  (file  should  be  opened  before  compiling  commences)." 


Missing  operands:  The  operands  of  operations  are  frequently  omitted  because 
they  are  recoverable  from  context.  Recovering  them  may  involve  considering 
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the  operation's  definition,  other  operands,  and  the  procedural  context. 
Example:  "Do  not  mount  a  tape  for  a  jo b  unless  the  tape  drive  has  been 
assigned  (to  that  job)." 

Incomplete  reference:  A  description  of  an  object(s)  may  match  several  objects 
whereas  it  was  intended  to  refer  to  only  one  or  possibly  a  subset  of  these 
objects.  A  complete  description  may  be  recovered  by  methods  similar  to  that 
for  missing  operands.  Example:  "When  the  mail  program  starts,  it  opens  the 
file  named  MESSAGE  (in  the  directory  of  the  job  running  the  program)." 

Type  coercions  Often,  people  using  natural  language  do  not  precisely  specify 
the  object  intended,  but  instead  specify  an  associated  object  or  a  subpart  of 
an  object.  This  situation  can  be  recognized  by  a  mismatch  between  the  type 
of  object  actually  specified  and  the  type  of  object  expected.  Example: 
"Information  messages  are  copied  to  each  logged-m  user  (to  the  terminal  of 
the  job  of  each  iogged-in  user)." 


Operation  of  SAFE 

The  goal  of  SAFE  Is  to  complete  the  various  partial  descriptions  in  the  user's 
specification  of  the  desired  program  so  as  to  produce  a  formal  specification  of  the  whole 
program.  SAFE  goes  through  several  phases,  but  in  all  phases  the  system  uses  a  variety  of 
constraints  to  achieve  the  goal  of  completing  the  partial  descriptions.  These  include  built-in 
criteria  that  any  formal  program  must  meet  (e  g.,  information  must  be  produced  before  it  is 
consumed),  built-in  heuristics  that  "sensible"  programs  will  meet  (e  g.,  the  value  of  a 
conditional  must  depend  on  the  program  data),  as  well  as  any  known  or  discovered 
constraints  particular  to  a  program's  domain  (e  g.,  each  file  in  a  directory  has  a  distinct 
name).  In  fact,  since  programs  are  highly  constrained  objects,  there  are  a  large  number  of 
constraints  that  any  "well-formed"  program  must  satisfy,  and  this  is  one  reason  programs  are 
hard  to  write. 

In  general,  each  partial  description  has  several  different  possible  completions.  Based 
on  the  partial  description  and  the  context  in  which  it  occurs,  an  ordered  set  of  possible 
completions  is  created  for  it  But  ono  decision  cannot  be  made  in  isolation  from  the  others; 
decisions  must  be  consistent  with  one  another  and  the  resulting  program  must  make  sense  as 
a  whole,  satisfying  all  the  criteria  of  well-formed  programs 

The  problem  of  finding  viable  completions  for  a  collection  of  partial  descriptions 
provides  a  classical  backtracking  situation,  since  there  are  many  interrelated  individual 
decisions  that,  in  combination,  can  be  either  accepted  or  rejected  on  the  basis  of  the 
constraints.  SAFE  utilizes  the  constraints  so  that  early  rejection  possibilities  can  be  realized. 

The  operation  of  SAFE  consists  of  threa  sequential  phases:  the  linguistics,  planning, 
and  meta-evaluation  phases.  The  cumulative  effect  of  these  phases  is  to  produce  a  formal 
specification  that  is  composed  of  declarative  and  procedural  portions.  The  declarative  part, 
or  domain  model,  specifies  the  types  of  objects  manipulated  by  the  process,  the  various 
ways  they  may  relate  to  one  another,  the  actions  that  may  be  performed  on  various  object 
types,  and  other  global  regularities  of  the  problem  domain.  The  procedural  portion  specifies 
the  controlled  application  of  actions  to  objects. 
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The  linguistic  phase,  using  production  rules,  transforms  the  parse  trees  of  the  English 
specification  into  fragments  that  retain  the  semantic  content  while  discarding  the  syntactic 
detail  The  production  rules  capture  many  context-sensitive  aspects  of  natural  language 
such  as  various  uses  of  the  verb  "be”  and  of  quantifiers.  The  production  rules  may  also  add 
declarations  to  the  domain  model,  with  user  approval,  when  this  is  required  for  interpretation 
of  the  input  This  procedure  is  accomplished  by  distinguishing  two  sets  of  conditions  on  each 
rule:  those  relating  to  the  linguistic  form  of  the  phrase  being  processed,  and  those  relating  a 
form  to  the  domain  model.  If  the  linguistic  form  conditions  are  not  satisfied  (e  g.,  a  clause 
using  a  transitive  verb)  but  the  domain  model  conditions  are  (e  g.,  the  verb  names  an  action 
In  the  problem  domain  that  has  operands  of  types  compatible  with  the  verb  erguments),  then 
the  domain  model  conditions  are  assumed. 

The  planning  phase  determines  the  overall  sequencing  of  the  operations  in  the  program. 
It  also  determines  which  fragments  belong  together  end  how  they  are  to  interact.  It  does 
this  by  using  explicit  sequencing  information  In  the  description,  such  as  "A  is  executed 
Immediately  after  B,”  "A  is  invoked  whenever  the  condition  C  becomes  true,"  as  well  as 
static  flow  constraints  on  wall-formad  processes  such  as: 

Before  information  Is  consumed  (used  by  one  fragment),  it  must  be  produced 
(created  by  the  same  or  another  fragment). 

Expected  outputs  of  the  whole  program  or  of  a  subprogram  must  be  produced 
somewhere  within  that  program. 

The  results  of  each  described  operation  must  be  used  or  referenced  somewhere. 

The  final  phase,  meta-evaluation,  uses  dynamic  constraints  to  help  determine  the 
proper  completion  of  partial  descriptions  Dynamic  constraints  are  those  that  apply,  or  at 
least  relate  to.  the  program  during  execution.  Examples  of  such  constraints  are: 

It  must  be  possible  (in  general)  to  execute  both  branches  of  e  conditional 
statement  (otherwise  why  would  the  user  have  specified  a  conditional). 

The  constraints  of  a  domain  must  not  be  violated. 

Since  no  actual  input  data  is  available  for  testing  the  execution  of  the  program  and 
since  the  program  must  be  well-formed  for  all  allowable  inputs.  Inputs  are  represented 
symbolically.  Instead  of  actual  execution,  the  program  is  symbolically  executed  on  the  inputs, 
which  provides  a  much  stronger  test  of  the  constraints  than  would  execution  on  any 
particular  set  of  inputs.  The  result  is  a  database  of  relationships  between  the  symbolic 
values  and.  Implicitly,  a  database  of  relationships  between  program  variables  that  are  bound 
to  these  values. 

All  decisions  concerning  the  proper  Interpretation  of  partial  descriptions  that  affect  the 
computation  to  some  point  In  the  execution  (but  not  beyond)  must  be  made  before  these 
dynamic  criteria  can  be  tested  at  that  point  in  the  execution.  Thus,  decisions  are  made  as 
they  are  needed  by  the  computation  of  the  program,  and  tha  symbolic  stata  of  the  program  is 
examined  at  each  stage  of  the  computation.  This  arrangement  allows  the  dynamic  state-of* 
computation  criteria  to  be  used  to  obtain  sarty  rejection  of  Infeasible  alternatives. 
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There  is  an  additional  point  worth  noting.  Representing  the  complete  state  of  a 
computation  during  symbolic  execution  is  very  difficult  (e.g..  It  Is  quite  hard  to  determine  the 
state  after  execution  of  a  loop  or  conditional  statement)  and  more  detailed  than  necessary 
for  testing  the  constraints.  Therefore,  the  SAFE  system  uses  a  weaker  form  of  symbolic 
Interpretation  called  Meta-Evaluation,  which  only  partially  determines  the  program's  state  as 
the  computation  proceads  (e  g.,  loops  are  executed  only  once  for  some  "generic"  element). 

Notice  that  symbolic  execution  requires  that  the  sequential  relationships  between  the 
fragments  be  known;  therefore  the  meta-evaluation  phase  must  follow  the  planning  phase. 

Finally,  the  global  referencing  constraints  (such  as  "The  body  of  a  procedure  must 
make  use  of  the  procedure's  parameters")  test  the  overall  use  of  names  within  the  program 
and,  thus,  cannot  be  tested  until  all  decisions  have  been  made.  These  criteria  can  be  tested 
only  after  the  Meta-Evaluation  is  complete. 


Status 

t 

The-  prototype  system  has  successfully  handled  the  75-200  word  specifications  of 
three  quite  distinct  programs.  In  these  cases  the  SAFE  output  of  a  completed  specification, 
including  domain  structure  definition,  requires  approximately  two  pages.  One  example 
concerned  part  of  a  system  for  scheduling  transmissions  in  a  communications  network.  Given 
a  table  (SOI)  containing  entries  for  various  natwork  subscribers  and  for  various  unassigned 
time  slots  (RATS),  a  schedule  of  absolute  times  when  a  particular  subscriber  could  broadcast 
on  the  network  was  tabulated.  The  input  specification  to  SAFE  is: 

((THE  SOU 
(IS  SEARCHED) 

FOR 

(AN  ENTRY  FOR  (THE  SUBSCRIBER))) 

(IF  ((ONE) 

(IS  FOUNO)) 

((THE  SUBSCRIBER'S  (RELATIVE  TRANSMISSION  TIME)} 

(IS  COMPUTED)  ACCORDING  TO  ("FORMULA- 1"))) 


((THE  SUBSCRIBER'S  (CLOCK  TRANSMISSION  TIME)) 
(IS  COMPUTEO)  ACCORDING-TO  ("FORMULA-2"))) 

WHEN  ((THE  TRANSMISSION  TIME)) 

(HAS  BEEN  COMPUTEO)) 

((IT) 

(IS  INSERTED) 

AS  (THE  (PRIMARY  ENTRY)) 

IN  (A  (TRANSMISSION  SCHEDULE)))) 

FOR  (EACH  RATS  ENTRY) 

(PERFORM) 

(:  ((THE  RATS'S  (RELATIVE  TRANSMISSION  TIME)) 
(IS  COMPUTED)  ACCORDING  TO  ("FORMULA- 1  •)) 
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((THE  RATS'S  (CLOCK  TRANSMISSION  TIME)) 

(IS  COMPUTEO)  ACCORDING  TO  ("FOflMUtA-2")))) 

((THE  RATS  (TRANSMISSION  TIMES)) 

(ARE  ENTERED) 

INTO  (THE  SCHEDULE)) 


Figure  1.  Actual  Input  tor  Imk  scheduling  example. 

In  formalizing  this  description,  SAFE  encountered  and  resolved  the  following 
characteristics  of  informal  specifications: 


number  of  missing  operands  ■  7 

number  of  incomplete  references  *12 

number  of  implicit  type  coercions  *  3 

number  of  Implicit  sequencing  decisions  ■  4 


Robustness  of  the  system  has  been  increased  by  processing  a  number  of 
perturbations  of  each  of  the  major  examples.  These  have  involved  specifying  the  same 
process  but  varying  the  syntax  and  vocabulary  used,  the  partial  dascriptions  used,  and  the 
formal  knowledge  provided  ebout  the  problem  domain. 


Futura  Developments 

The  key  technical  restrictions  of  the  prototype  system  appear  to  be  (a)  the  sequential 
application  of  the  three  phases,  which  prohibits  adequate  Interactions  batween  the 
expertise  embodied  In  each,  and  (b)  the  backtracking  within  the  meta*evaluation  phase, 
which  corresponds  to  restarting  tha  symbolic  execution  from  an  earlier  point,  which  can  lead 
to  much  unnecessary  search.  To  correct  these  limitations,  a  reformulation  of  the  system 
architecture  within  a  framework  derived  from  the  HEARSAY  II  speech  understanding  system 
(see  article  5peech.C)  is  currently  In  progress.  This  framework  consists  of  a  number  of 
cooperating  experts  Interacting  via  a  "blackboard*  databasa. 

Simultaneously,  the  system  Is  being  scaled  up  to  handle  larger  practical  specifications 
(approximately  20  pages).  Later,  the  project  will  consider  the  formalization  of  incremental 
informal  specifications  so  that  it  can  also  provide  help  during  both  specification  formulation 
and  maintenance  activities. 


References 
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E.  Programmer's  Apprentice 

The  Programmer's  Apprentice  (PA)  is  an  Interactive  system  for  assisting  programmers 
with  the  task  of  programming.  It  is  being  designed  and  implemented  at  MIT  by  Charles  Rich. 
Howard  Shrobe,  and  Richard  Waters.  Currently,  most,  but  not  all.  of  the  modules  that 
comprise  the  PA  system  are  running.  It  should  be  kept  in  mind  that  the  scenario  desenoed 
here  Illustrates  the  projected  operation  of  the  system,  not  the  present  operation.  The  Intent 
of  the  PA  Is  that  the  programmer  will  do  the  hard  parts  of  design  and  implementation,  while 
the  PA  will  act  as  a  Junior  partner  and  critic,  keeping  track  of  details  and  assisting  the 
programmer  In  the  documentation,  verification,  debugging,  and  modification  of  his  program.  In 
order  to  cooperate  with  the  programmer  in  this  fashion,  the  PA  must  be  able  to  "understand" 
what  Is  Qomg  on.  From  the  point  of  view  of  Artificial  Intelligence,  the  central  development  of 
the  Programmer's  Apprentice  project  has  been  the  design  of  a  representation  (<  ailed  a 
"plan")  for  programs  and  for  knowledge  about  programming  that  serves  as  the  basis  for  this 
"understanding."  Devetoping  and  reasoning  about  plans  is  the  central  activity  of  the  PA. 

The  "plan"  for  a  program  represents  the  program  as  a  network  of  operations 
interconnected  by  links  explicitly  representing  data  flow  and  control  flow.  The  advantage  of 
this  aspect  of  the  plan  formalism  is  that  It  abstracts  away  from  the  specific  syntactic 
constructs  used  by  various  programming  languages  in  order  to  implement  control  flow  end 
data  flow  The  most  novel  aspect  of  the  plan  formalism  is  that  It  goes  beyond  this  level  in 
order  to  create  a  vehicle  for  expressing  the  logical  interrelationships  in  a  program  First,  a 
plan  is  not  Just  a  graph  of  primitive  operations.  Rather,  it  is  a  hierarchy  of  segments  within 
segments,  where  each  segment  corresponds  to  a  unit  of  behavior  and  has  an  input/output 
specification  that  describes  features  of  this  behavior.  The  plan  specifies  how  each 
nonterminal  segment  is  constructed  out  of  the  segments  contained  within  it.  This 
segmentation  is  Important  because  It  breaks  the  plan  up  into  localities  that  can  be 
understood  In  isolation  from  each  other.  Second,  the  behavior  of  a  segment  is  related  to  the 
behavior  of  its  subsegments  This  Interrelationship  Is  represented  by  explicit  dependency 
links  that  record  the  goal-subgoal  and  prerequisite  relationships  between  the  input-output 
specification  for  a  segment  and  those  for  its  subsegments  Taken  together,  the  links 
summarise  a  proof  of  how  these  specifications  for  a  segment  follow  from  the  specifications 
of  Its  subsegments  end  from  the  way  the  subsegments  are  Interconnected  by  control  flow 
and  data  flow.  A  final  aspect  of  the  plan  formalism  Is  that  there  may  be  more  than  one  plan 
for  a  given  segment  of  a  program,  with  each  plan  representing  a  different  point  of  view  on 
the  segment.  The  data  structures  used  by  a  program  are  represented  by  specifying  their 
parts,  properties,  and  the  relationships  between  them  in  a  method  similar  to  data 
abstractions  (Zilles,  1076;  llskov,  1077). 

Knowledge  about  programming  in  general  Is  also  represented  using  plans  and  data 
structure  descriptions.  This  knowledge  Is  stored  in  the  PA  in  a  database  of  common 
algorithms  and  data  structure  implementations  callod  the  "plan  library."  The  PA's 
"understanding"  of  a  program  la  embodiad  in  a  hierarchical  plan  for  It.  In  general,  the  subplan 
for  each  individual  segment  In  terms  of  Its  subsegments  will  be  an  Instance  of  some  plan 
stored  In  the  plan  library.  This  structurs  gives  the  PA  access  to  all  of  the  Information  stored 
In  the  plan  library  about  the  particular  subptan  as  soon  as  it  can  make  a  guess  as  to  what  the 
subpian  ia. 
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A  Scenario  of  Use  of  the  Programmer's  Apprentice 

The  following  Imagined  conversation  between  a  programmer  and  the  PA  is  presented  in 
order  to  illustrate  the  Intended  operation  of  the  system.  (Comments  discussing  the  scenario 
are  printed  in  italics.)  The  scenario  illustrates  the  following  four  basic  areas  in  which  the  PA 
can  assist  a  programmer: 

( 1 )  Documentation:  One  of  the  primary  services  the  PA  provides  is  automatic, 

permanent,  and  In-depth  documentation  of  the  program.  The  PA  remembers 
not  only  explicit  commentary  supplied  by  the  programmer  with  the  code,  but 
also  a  substantial  body  of  derived  information  describing  the  logical 
structure  underlying  the  program,  such  as  the  dependency  relationships 
between  parts  of  the  program. 

(2)  Verification:  The  development  of  a  program  is  accompanied  by  the 

construction  of  a  sequence  of  plans  at  various  levels  of  abstraction.  At 
each  step,  the  PA  attempts  to  verify  that  the  current  plan  Is  both 
consistent  and  sufficient  to  accomplish  the  desired  goal.  As  more 
information  is  specified,  the  PA's  reasoning  about  these  plans  approaches  a 
complete  verification  of  the  program. 

(3)  Debugging  Any  discrepancy  between  the  PA's  understanding  of  the 

programmer's  Intent  and  the  actual  operation  of  the  program  Is  reported  to 
the  programmer  as  a  potential  bug 

(4)  Managing  Modification:  Perhaps  the  most  useful  aspect  of  the  PA  is  that  it 

can  help  a  programmer  modify  his  program  without  introducing  new  bugs. 

Based  on  Its  knowledge  of  the  logical  relationships  between  parts  of  a 
program,  the  PA  is  able  to  determine  what  parts  of  a  program  can  be 
affected  by  a  proposed  change,  and  how  they  can  be  affected  It  can  use 
this  information  to  warn  the  programmer  of  Impending  difficulties. 

The  scenario  traces  the  design,  coding,  and  subsequent  modification  of  a  program  that 
deletes  an  entry  from  a  hash  table  The  scenario  picks  up  in  the  middle  of  a  session,  at  a 
point  where  the  programmer  has  already  made  many  design  choices  and  conveyed  them  to 
the  PA  In  particular,  he  has  stated  the  input-output  specifications  for  the  subroutine 
DFIETE  and  chosen  the  basic  data  structures  to  be  used.  The  hash  table  is  implemented  as 
an  array  of  linked  lists  called  "buckets  "  Each  of  the  entries  in  a  bucket  is  a  dotted-pair 
whose  CAR  Is  the  key  for  the  entry. 

Programmer:  The  plan  for  the  routine  DELETE  has  three  main  steps:  The 
first  one  hashes  the  input  key  to  get  an  index;  the  second  one  fetches  the 
bucket  corresponding  to  that  index  from  the  table;  and  the  third  splices 
out  the  entry  in  the  bucket  whose  key  part  is  equal  to  the  input  key. 


T he  only  feature  of  this  scenario  that  Is  not  a  current  research  goal  of  the  Programmer's 
Apprentice  project  is  the  use  of  natural  English  dialogue  Although  the  system  should  he 
able  to  support  a  sophisticated  natural  language  'front-end.'  no  effort  has  been  expended 
in  this  direction  to  date  As  a  result,  the  programmer  would  actually  type  in  something 
like 
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"(subsegments-of-DELETE  step1-step2-step3) 

(specltlcations-fof-stepl  (inputs -key  1 )  (outputs*index) 
(assert-(*-lndex-(hash-key  1 ))))  ..." 

The  PA  usts  the  programmer's  outline  of  the  design  in  order  to  build  a  plan  representing 
the  program’s  intended  structure  References  to  basic  programming  techniques  such  as 
’splicing  out’  (deleting  elements  from  a  linked  list)  trigger  the  retrieval  of  plans  from  the 
plan  library 


PA:  I  can't  verify  that  this  plan  satisfies  the  specifications  for 
DELETE.  In  particular,  when  the  input  key  is  equal  to  the  key  part  of 
the  first  entry  in  a  bucket,  I  can't  prove  that  the  entry  Is  no  longer  In 
the  table  after  DELETE  is  executed.  Note  that  In  this  situation,  the 
bucket  after  splicing  out  entries  has  a  different  Initial  cell  from  the 
bucket  before  splicing  out  entries,  and  that  therefore  the  table  isn't 
changed 


After  constructing  a  plan,  the  PA  uses  its  reasoning  abilities  in  order  to  check  for  bugs 
Above,  the  PA  finds  one  and  reports  it  to  the  programmer  In  general,  it  is  the 
responsibility  of  the  programmer  to  supply  corrections  Below,  the  programmer  corrects  the 
plan  by  issuing  semantically  based  editing  commands  to  the  PA 

Programmer:  Modify  the  plan  as  follows.  Add  a  final  step  that  compares 
the  bucket  after  splicing  with  the  bucket  before  splicing  and  stores  the 
new  bucket  In  the  table  under  the  Index  If  the  two  initial  cells  are  not 
identical. 

PA:  OK,  the  plan  now  seems  correct. 

The  propammer  then  types  »n  the  code  for  DELETE  In  cases  where  the  propammer 
specifies  the  plan  for  a  program  as  completely  as  he  does  here,  he  could  ask  the  PA  to 
write  the  cotie  It  is  expected  that  the  propammer  will  actually  only  write  code  in 
situations  where  it  is  easier  to  write  the  code  than  to  fully  explain  the  plan 

Programmer:  (DEFUN  DELETE  (KEY)  ;  hash  table  delete  routine. 

(PROG  (BKT 1  BKT2) 

(SETQ  BKT  1  (TBL  (HASH  KEY))  BKT2  (BKT-DELETE  KEY  BKT  1 )) 

(COND  ((NOT  (EQ  BKT  1  BKT2)) 

;  check  If  first  entry  In  bucket  deleted. 

(STORE  (TBL  (HASH  KEY))  BKT  1 ))))) 


The  PA  analytes  the  code  written  by  the  propammer  in  order  to  construct  a  plan  for  it. 
Comparing  this  plan  with  the  plan  directly  described  by  the  propammer,  the  PA  discovers 
an  implementation  bug  In  the  last  line  of  code  the  user  probably  meant  to  store  into  the 
hash  table  not  the  first  bucket,  BKT  I,  but  the  second  bucket,  BKT  2. 

PA:  This  code  la  not  consistent  with  your  plan.  When  the  first  entry  Is 
deleted,  the  code  stores  the  bucket  before  splicing,  rsther  than  the 
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bucket  after  splicing  Did  you  mean 
"(Sro«f-(TBl  -(HASH-KEY))-BKT?)"? 

Programmer:  Yes,  fix  the  code  Also,  use  the  standard  list  deletion 
algorithm  to  code  BKT-DELETE  so  that  it  splices  out  of  the  bucket  all 
entries  whose  key  equals  the  indicated  value 

PA  (DEFUN  BKT-DE  LETE  (KEY  BUCKET)  ;  list  deletion  routine  coded  by  the  PA. 
(PHOG  (CURRENT  PREVIOUS) 

(SETQ  CURRENT  BUCKET  PREVIOUS  Nil) 

IP  (COND  ((NULL  CURREN1)  (RETURN  BUCKET)) 

((EQUAL  (CAAR  CURRENT) KEY) 

,  if  keypart  equal,  splice  out  entry. 

(COND  (PREVIOUS  (RPlACD  PREVIOUS  (CDR  CURRENT))) 

(T  (SETQ  BUCKET  (CDR  BUCKE  T  ))))) 

;special  if  first  element  deleted 
(T  (SE  TO  PREVIOUS  CURRENT))) 

(SETQ  CURRENT  (CDR  CURRENT)) 

(GO  LP))) 


T ht  FA  is  able  tc  cotie  a  program  like  this  itsrlf,  rwn  though  tht  programmer  did  not 
specify  a  plan  fo t  it.  because  it  is  an  implementation  of  a  standard  algorithm  that 
corns  ponds  to  a  plan  in  tht  plan  library  Code  written  by  the  PA  can  be  freely  misted 
with  coiie  written  by  the  programmer  and  each  can  edit  the  other  s  work 

Some  time  later  it  is  necessary  to  mo>iify  the  representation  of  the  buckets  The  PA  helps 
the  (perhaps  new)  programmer  maintain  consistency 

Programmer  I  want  to  make  a  change  in  the  representation  of  buckets  in 
the  hashing  programs  A  bucket  now  has  two  parts:  a  count  field  and  a 
list  of  entries  as  before  The  count  field  will  contain  an  integer  equal 
to  the  length  of  the  list  of  entries  Let  a  bucket  be  a  dotted-pair  whose 
CAR  is  the  count  and  whose  COR  is  the  list  of  entries. 

H'^rn  the  PA  inspects  the  logical  dependencies  in  the  plans.  It  discovers  that  a  type 
requirement  has  been  violated 

PA  The  proposed  change  will  affect  DELETE  as  follows:  After  the 
bucket  is  fetched,  It  is  passed  to  BKT-DELETE  which  does  the  actual 
deletion  However,  BKT-DELETE  requires  a  list  of  entries  while  In 
this  new  Implementation,  a  bucket  is  no  longer  Just  a  kst  of  entries. 

Programmer:  Explain  the  plan  for  BKT-DELETE  please. 

PA  BKT-DELETE  Is  Implemented  following  the  standard  list  deletion 
plan  The  inputs  are  a  key  and  a  list  of  entries.  The  plan  is  a  search 
loop  using  two  pointers:  a  pointer  to  the  current  entry,  which  is 
initialized  to  the  input  list,  and  a  trailing  pointer,  which  is  initialized 
to  NIL.  On  each  iteration  the  loop  tests  the  key  of  the  current 
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entry.  If  it  Is  equal  to  the  Input  key,  the  element  Is  spliced  out  of  the 
list  by  RPLACDing  the  trailing  pointer. 

Acting  in  its  role  oj  rfpcumrnfa/ipn,  the  PA  describes  the  murfurr  of  BKT-DELETE  by 
reporting  out  the  structure  of  its  plan  The  programmer  then  decides  how  to  modify 
BKT-DELETE 

Programmer  Change  the  specifications  of  BKT-DELETE  so  that  it 
expects  as  Input  a  bucket  of  the  new  type,  rather  than  a  list  of  entries. 

Also  change  the  initialization  of  the  pointers  to  read: 
"(SETQ-PRFVIOUS-BUCKET-  CURRENT-(CDR-BUCKET))". 


The  PA  reanalyies  the  program  in  order  to  verify  that  this  change  fixes  the  problem 
without  causing  any  other  problems  to  arise  This  analysis  reveals  the  presence  of 
inaccessible  code  which  the  PA  reports  to  the  programmer 

PA  OK  this  fixes  the  problem  However,  now  the  special  case  tests  in 
BKT-DEIETE  and  DELETE  are  not  necessary  since  PREVIOUS  is 
never  NIL  and  therefore  elements  are  always  deleted  by  side*effect. 


and  sc  the  session  continues  with  the  PA  looking  ct*r  the  programmer's  shoulder 


Operation  of  the  System 

The  design  of  the  PA  is  based  on  four  modules,  a  surface  analyzer,  a  recoqnizer,  an 
Interactive  module,  and  a  deductive  module,  and  two  data  bases,  the  plan  library  and  a 
scratch  pad  called  the  “design  notebook  “  Only  the  first  three  modules  have  been 
implemented  so  far  As  described  above,  the  plan  library  contains  the  PA'S  knowledge  of 
programming  in  general  The  design  notebook  contains  the  PA’s  evolving  knowledge  of  the 
particular  programs  being  worked  on  and  serves  as  the  communication  center  for  the  system 
as  a  whole  The  modules  communicate  with  one  another  solely  by  making  assertions  In  the 
design  notebook  Each  module  has  predefined  trigger  patterns  which  cause  it  to  perform 
specific  tasks  (such  as  making  a  deduction  or  querying  the  user)  whenever  appropriate 
assertions  appear  In  the  notebook.  Every  assertion  added  to  the  notebook  Is  *  also 
accompanied  by  a  justification  of  Its  presence  These  justifications  make  It  posslb  for  the 
PA  to  account  for  Its  actions 

The  surface  analyzer  Is  used  to  construct  simple  surface  plans  for  sections  of  code 
written  by  the  programmer  It  is  the  only  module  whose  Implementation  depends  on  the 
particular  programming  language  being  used.  To  date,  surface  analyzers  have  been 
implemented  for  both  LISP  and  FORTRAN.  The  recognition  module  takes  over  where  the 
surface  analyzer  leaves  off  in  order  to  construct  a  detailed  plan  tor  a  piece  of  code-  It  t*rst 
breaks  up  the  surface  plan  by  identifying  weakly  interacting  subsegments  that  can  be 
further  analyzed  in  isolation  from  each  other.  It  then  compares  these  subsegments  with  the 
plana  in  the  library  in  order  to  determine  more  detailed  plana  for  the  program. 

The  Interactive  module  la  the  communication  link  between  the  PA  and  the  programmer. 
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It  converts  the  programmer's  input  (which  can  consist  of  code,  direct  specification  of  a  plan, 
or  various  requests)  into  assertions  in  the  design  notebook  and  decides  what  to  say  to  the 
programmer  based  on  the  information  currently  in  the  notebook  I  he  deductive  module 
operates  In  the  background  in  cooperation  with  all  of  the  other  modules  It  performs  the 
deductions  necessary  to  verify  a  proposed  match  between  a  program  and  a  plan,  to  detect 
bugs  in  a  plan,  and  to  determine  the  ramifications  of  a  proposed  modification  to  a  program  or 
plan. 


At  a  given  moment,  the  design  notebook  holds  the  sum  total  of  what  the  PA  knows 
about  the  program  being  worked  on  I  his  information  triggers  additional  activity  by  the 
modules  If  the  rocogni/er  and  deductive  modules  are  strong  enough  and  the  program  is 
simple  enough,  this  process  will  culminate  in  a  complete  understanding  and  verification  of  the 
program  However,  typically,  this  will  not  be  the  case,  and  some  questions  (such  as  the 
exact  plan  for  a  segment  or  the  correctness  of  a  specification)  will  remain  unresolved  in  the 
notebook  The  flexible  architecture  chosen  for  the  PA  makes  it  possible  for  the  PA  to  exhibit 
useful  partial  performance  in  this  situation.  It  is  able  to  ignore  what  It  doesn't  understand 
and  work  constructively  with  what  it  does  understand  The  programmer  can  be  called  upon 
to  fill  in  the  gaps 

Current  Status  of  the  Programmer's  Apprentice 

Rich  and  Shrobe  (19/6)  laid  out  the  basic  idea  of  a  plan  and  the  initial  design  of  the 
PA  Since  that  time  Rich.  Shrobe.  and  Waters  have  been  working  together  on  further  aspects 
of  the  theory  along  with  design  and  implementation  of  the  PA 

Rich’s  work  (forthcoming)  centers  on  the  plan  library  and  the  recognition  process  He  is 
using  the  plan  representation  In  order  to  codify  a  large  body  of  common  programming 
strategies  In  the  domain  of  non  numerical  programming  He  is  also  designing  a  recognition 
module  that  will  be  able  to  Identify  instances  of  plans  in  the  library  as  they  occur  in 
combination  in  a  programmer's  program 

Shrobe  (19/8)  has  implemented  a  prototype  deductive  module  that  can  reason  about 
programs  represented  by  plans  An  important  aspect  of  its  operation  is  that  it  maintains  a 
record  of  the  dependency  relationships  embodied  in  its  deductions  In  doing  this  it  builds  up 
some  of  the  logical  structure  that  is  a  vital  part  of  a  plan  for  a  program.  He  is  currently 
designing  an  improved  version  of  this  deductive  module. 

Waters  (19/6.  19/8)  has  implemented  a  system  that  can  analy?e  the  code  for  a 
program  and  produce  the  basic  structure  of  a  plan  for  the  entire  program.  The  system 
corresponds  to  the  surface  analysis  atodule  and  the  initial  phase  of  the  recognition  process. 
The  basic  Idea  behind  Waters'  work  is  that  plans  for  typical  programs  are  built  up  in  a  small 
number  of  stereotyped  ways  and  that  features  in  the  code  for  a  program  can  be  used  to 
determine  how  the  plan  for  the  program  should  be  built  up. 

Thr  goal  for  the  immediate  future  is  to  construct  a  prototype  system  that  can  exhibit 
the  kind  of  behavior  shown  in  the  scenario  To  do  this,  an  Interactive  module  must  be  built, 
and  the  other  modules  must  be  connected  together  into  an  Integrated  system.  Looking 
further  ahead,  additional  modules  (such  as  a  simple  program  synthesis  module,  and  one 
dealing  with  efficiency  Issues)  will  be  added  to  the  PA,  and  the  existing  ones  will  be 
strengthened  so  that  the  PA  can  assume  an  even  larger  part  of  the  programming  process. 
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F.  PECOS 

Developed  m  1970  by  David  Barstow  (Barstow. 1 976).  the  automatic  programming 
system  PECOS  serves  as  the  coding  expert  of  Standford's  PSI  project  (see  article  OS  and 
Barstow,  1979  The  foundations  of  PECOS  are  based  on  ideas  presented  in  Green  &  Barstow 
(  197  7a),  and  Green  &  Barstow  (1978)  Though  PECOS  can  act  in  conjunction  with  the  PSt 
system.  It  can  also  stand  on  its  own  and  interact  directly  with  the  user.  The  original  problem 
area  of  PECOS  was  symbolic  programming,  which  includes  simple  list  processing,  sorting, 
database  retrieval,  and  concept  formation  Ihis  domain  has  recently  been  extended  to  graph 
theory  and  simple  number  theory  Programs  are  specified  in  terms  of  very  high-level 
constructs  including  data  structure^  like  collections  or  mappings,  and  operation j.  like  testing  for 
membership  in  a  collection  or  computing  the  inverse  image  of  an  object  under  a  mapping 
Knowledge  about  programming  in  the  problem  area  has  been  codified  (i  e  ,  made  explicit  and 
put  into  machine  useable  form)  primarily  in  the  form  of  transformation  rules,  and  these  have 
been  entered  into  PECOS'S  knowledge  base  Most  of  the  rules  describe  how  constructs  and 
operations  can  be  represented  or  implemented  in  terms  of  other  constructs  and  operations 
-that  are  closer  to,  or  actually  in,  the  target  language  LISP  (actually  a  subset  of  INTERLISP, 
Teitelman  et  al  ,  1978)  These  rules  can  identify  design  decisions  and  can  also  serve  as 
limited  explanations 

Tho  operation  of  the  system  proceeds  by  the  repeated  selection  and  application  of  the 
transformation  rules  in  the  knowledge  base  to  parts  of  the  program  Also  referred  to  as 
gradual  refinement .  this  transformation  process  reduces  the  high-level  specification  to  an 
implementation  fully  within  the  target  language  Each  application  of  a  rule  is  said  to  produce 
a  partial  Implementation  or  refinement  of  the  program,  and  the  transformation  rules  are  called 
refinement  rules. 


Conflict  Resolution 

At  some  points  during  the  transformation  process,  a  conflict  may  arise  because  several 
rules  apply  to  the  same  part  of  the  program  The  handling  of  this  situation  is  important:  The 
application  of  the  soveral  rules  ultimately  results  in  different  target  language 
implementations  that  often  vary  significantly  m  terms  of  efficiency.  There  are  three  ways  to 
handle  this  situation 

( 1 )  If  PECOS  is  Interacting  directly  with  the  user,  the  user  may  select  which  rule 
should  be  applied  (and  thus  which  implementation  will  be  constructed) 

(?)  For  the  convenience  of  the  user,  PECOS  can  choose  one  of  the  applicable 
rules,  using  about  a  do/en  heuristics  it  has  to  pick  the  rule  that  leads  to  the 
more  efficient  implementation  These  heuristics  handle  about  two-thirds  of 
the  choices  that  typically  arise 

(3)  When  no  heuristic  applies  and  the  user  Is  uncertain  about  which  rule  Is 
"best"  for  his  or  her  purposes,  PECOS  can  apply  each  In  parallel, 
constructing  a  separate  Implementation  for  each  rule  applied. 

When  PECOS  functions  as  the  Coding  Expert  of  the  PSI  program  synthesis  system 
(Green,  1976b;02),  choices  between  rules  are  made  by  an  automated  Efficiency  Expert 
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known  as  LIBRA  (sea  article  09,  Kant  (1977)),  which  incorporates  more  sophisticated 
analytic  techniques  than  the  simple  heuristics  used  by  PECOS.  The  capability  of  developing 
different  Implementations  in  parallel  Is  used  axtansively  In  the  interactions  between  Pi  cn\ 
and  LIBRA  (Barstow  &  Kant,  1977). 


PECOS's  Knowledge  Base 

PECOS's  knowledge  base  consists  of  about  400  rules  dealing  with  a  variety  of  symbolic 
programming  concepts  The  most  abstract  concepts  are  those  of  the  specification  language 
(e  g,  collection,  inverse  Image,  enumerating  the  objects  in  a  collection,  etc.).  The 
implementation  techniques  covered  by  the  rules  include  the  representation  of  collections  as 
linked  lists,  arrays  (both  ordered  and  unordered),  and  Boolean  mappings,  and  the 
representation  of  mappings  as  tables,  sets  of  pairs,  property  list  markings,  and  inverted 
mappings  (indexed  by  range  element).  As  a  natural  by-product,  these  rules  also  cover 
sorting  within  a  transfer  paradigm  that  includes  simpler  sorts  such  as  insertion  and  selection. 
While  some  of  the  rules  are  specific  to  LISP,  about  three-fourths  of  the  rules  are 
independent  of  LISP  or  any  other  target  language. 

Internally,  PECOS's  rules  are  represented  as  condition-action  pairs.  The  conditions  are 
particular  configurations  of  abstract  operations  and  data  structures  that  are  matched  against 
parts  of  the  developing  program.  Where  the  match  Is  successful,  the  actions  replace  parts 
of  the  abstract  concepts  with  refinements  of  those  parts 

In  the  system  of  refinement  rules,  intermediate-level  abstractions  play  a  major  role. 
One  benefit  of  such  Intermediate-level  concepts  is  a  certain  economy  of  knowledge 
Consider,  for  example,  the  construct  of  a  sequential  collection  a  linearly  ordered  group  of 
locations  in  which  the  elements  of  a  collection  can  be  stored.  Since  there  is  no  constraint  on 
how  the  linear  ordering  is  Implemented,  the  construct  can  be  seen  as  an  abstraction  (or 
generalization)  of  both  linked  lists  and  arrays.  Much  of  what  programmers  know  about  linked 
lists  is  In  common  to  what  they  know  about  arrays,  and  hence  can  be  represented  as  one 
rule  set  about  sequential  collections,  rather  than  as  two,  one  about  linked  lists,  and  one 
about  arrays  Another  benefit  of  these  intermediate-level  concepts  is  that  the  process  of 
choosing  between  alternative  (valid)  rules  is  facilitated:  Attention  can  be  focused  on  the 
essential  aspects  of  a  choice  while  ignoring  irrelevant  details. 


Rules  about  Programming  Knowledge 

Most  currently  available  sources  of  programming  knowledge  (e.g..  books  and  articles) 
lack  the  precision  required  for  effective  use  by  a  machine.  The  descriptions  are  often 
informal,  with  details  omitted  and  assumptions  unstated.  Before  this  programming  knowledge 
can  be  made  available  to  machines,  it  must  be  made  more  precise;  the  assumptions  must  be 
made  explicit;  and  the  details  must  be  filled  m. 

PECOS's  rules  provide  much  of  this  precision  for  the  domain  of  elementary  symbolic 
programming.  For  example,  consider  the  following  rule  (an  English  paraphrase  of  PECOS's 
internal  representation): 

A  collection  may  be  represented  at  »  mapping  of  objects  to  Boolean  values;  the  default 
range  object  is  FALSE 


62 


Automatic  Programming 


Most  programmers  know  this  tact:  that  a  collection  may  be  represented  by  its 
characteristic  function.  Without  knowing  this  rule,  or  something  similar,  it  is  almost  impossible  to 
understand  why  a  bitstring  can  be  used  to  represent  a  set  (or,  tor  that  matter,  why  property 
list  markings  work).  Yet  this  rule  Is  generally  left  unstated  In  discussions  of  bitstring 
representations.  As  another  example,  consider  the  following  rule: 

An  association  table  whose  keys  are  integers  from  a  fixed  range  may  be  represented  as  an 
array  subregion 

The  fact  that  an  array  Is  simply  a  way  to  represent  a  mapping  of  Integers  to  arbitrary 
values  is  well  known  and  usually  stated  explicitly.  The  detail  that  the  Integers  must  be  from 
a  fixed  range  Is  usually  not  stated.  Note  that  if  the  integers  are  not  from  a  fixed  range,  then 
an  array  is  the  wrong  representation  and  something  like  a  hash  table  should  be  used. 

PECOS's  rules  also  identity  particular  design  decisions  involved  In  programming.  For 
example,  one  of  the  crucial  decisions  In  building  an  enumerator  of  the  objects  in  a  sequential 
collection  is  selecting  the  order  in  which  they  should  be  enumerated.  This  decision  is  often 
made  only  implicitly.  For  example,  the  use  of  the  LISP  function  MAPC  to  enumerate  the 
objects  in  a  list  assumes  implicitly  that  the  stored  (or  "natural")  order  is  the  right  order  in 
which  to  enumerate  them.  While  this  is  often  correct,  there  are  times  when  some  other  order 
is  desired.  For  example,  the  selector  of  a  selection  sort  involves  enumerating  the  objects 
according  to  a  particular  ordering  relation.  A  second  major  decision  in  building  an  enumerator 
Involvos  selecting  a  way  to  save  the  state  of  the  computation  between  cells  to  the 
enumerator  The  use  of  a  location  (eg.,  index  or  list  cell)  to  specify  the  current  state  Is 
based  on  knowing  the  following  rule: 

If  the  enumeration  order  is  the  same  as  the  stored  order,  the  state  of  an  enumeration  may 
be  represented  as  a  location  In  the  sequential  collection 

Were  the  enumeration  order  different  from  the  stored  order  (as  in  a  selection  sort), 
then  some  other  state-saving  scheme  would  be  needed,  such  as  deleting  the  objects  or 
marking  them  in  some  fashion. 

Another  interesting  aspect  of  PECOS's  rules  is  that  they  have  s  certain  kind  of 
explanatory  power.  Consider,  for  example,  a  well-known  trick  for  computing  the  intersection 
of  two  linked  lists  of  atoms  In  linear  time  Map  down  the  first  list  and  put  a  special  mark  on 
the  property  list  of  each  atom;  then  map  down  the  second  list  collecting  only  those  atoms 
whose  property  lists  contain  the  special  mark.  This  technique  cen  be  understood  on  the 
basis  of  the  following  four  of  PECOS's  rules  (in  addition  to  the  rules  about  representing 
collections  as  linked  ksts): 

A  collection  may  be  represented  as  a  mapping  of  objects  to  Boolean  values;  the  default 
range  object  is  FALSE 

A  mapping  whose  domain  consists  of  atoms  may  be  represented  using  property  list 
markings 

The  intersection  of  fare  collections  may  be  implemented  by  enumerating  the  objects  In  one, 
and  while  enumerating  them,  collecting  those  that  are  members  of  the  other. 
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If  a  collection  is  input.  It]  representation  may  be  contorted  into  any  other  representation 
before  further  processing 

Given  these  rules,  the  trick  works  by  first  converting  the  representation  of  one 
collection  from  a  linked  list  to  property  list  markings  with  Boolean  values,  and  then  computing 
the  Intersection  in  the  standard  way,  except  that  a  membership  test  for  property  list 
markings  involves  a  call  to  GETPROP  rather  than  a  scan  down  a  linked  list. 


Status 

PECOS  is  able  to  implement  abstract  algorithms  (i.e  .  a  very  high-level  specification)  in 
a  variety  of  domains,  including  elementary  symbolic  programming  (simple  classification  and 
concept  formation  algorithms),  sorting  (several  versions  of  selection  and  Insertion  sort), 
graph  theory  (a  reachability  algorithm),  and  even  simple  number  theory  (a  prime  number 
algorithm)  In  each  case,  PECOS's  knowledge  about  different  implementation  techniques 
enabled  the  construction  of  a  variety  of  alternative  implementations,  often  with  significantly 
different  efficiency  characteristics. 

PECOS's  success  demonstrates  the  viability  of  the  knowledge-based  approach  to 
automatic  programming  In  order  to  develop  this  approach  further,  two  research  directions 
seem  particularly  useful. 

First,  programming  knowledge  for  other  domains  must  be  codified.  In  the  process,  rules 
developed  for  one  domain  may  be  found  to  be  useful  in  other  domains.  With  the  hope  of 
verifying  the  wider  utility  of  PECOS's  rules  about  collections  and  mappings,  Yale's 
Knowledge-based  Automatic  Programming  Project  Barstow,  1978  Is  currently  codifying  the 
programming  knowledge  needed  for  elementary  graph  algorithms 

As  an  example,  consider  the  common  technique  of  representing  a  graph  as  an 
adjacency  matrix.  In  order  to  construct  such  a  representation,  only  one  rule  about  graphs 
need  be  known: 

A  graph  may  be  represented  as  a  pair  of  sets  a  set  of  vertices  (whose  elements  are 
primitive  objects)  and  a  set  of  edges  (whose  elements  are  pairs  of  vertices) 

The  rest  of  the  necessary  knowledge  is  concerned  with  sets  and  mappings  and  is 
independent  of  Its  application  to  graphs  For  example,  in  order  to  derive  the  bounds  on  the 
matrix,  one  need  only  know  that  primitive  objects  may  be  represented  as  integers,  that  a  set 
of  otherwise  unconstrained  Integers  may  be  represented  as  a  sequence  of  consecutive 
Integers,  and  that  a  sequence  of  consecutive  Integers  may  be  represented  as  lower  and 
upper  bounds.  To  derive  the  representation  of  the  matrix  itself,  one  need  only  know  PECOS's 
rules  about  Boolean  mappings  and  association  tables,  plus  the  fact  that  a  table  whose  keys 
are  pairs  of  Integers  m  fixed  ranges  may  be  represented  as  a  two-dimensional  matrix. 

Second,  different  types  of  programming  knowledge  need  to  be  codified.  Two  types 
seem  particularly  Important:  efficiency  knowledge  and  strategic  knowledge.  LIBRA  (article 
□9).  which  acts  together  with  PECOS  in  PSl's  synthesis  phase,  embodies  a  large  amount  of 
efficiency  knowledge;  but  much  remains  to  be  done.  Very  little  work  on  the  use  of  general 
strategies  (e  g.,  divide  and  conquer)  in  program  synthesia  has  been  done.  The  letter  seems 
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an  especially  Important  direction,  since  such  strategies  seam  to  play  a  major  role  In  human 
programming. 
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G.  OEOALUS 

DEDALUS.  the  DEDuctive  ALgorithm  Ur-Synthesizer,  accepts  an  unambiguous,  logically 
complete,  very  high-level  specification  of  a  desired  program  and  through  repeated 
application  of  transformation  rules  seeks  to  reduce  it  to  an  implementation  within  a  simple 
MSP-like  target  language  This  target  language  implementation  is  guaranteed  to  be  correct 
(l.e  ..  logically  equivalent  to  the  high-level  specification)  and  to  terminate.  The  knowledge 
that  ultimately  relates  the  constructs  of  the  specification  language  to  those  in  the  target 
language  Is  expressed  in  the  transformation  rules.  But  of  special  importance  are  certain 
rules  that  express  general  programming  principles  that  are  Independent  of  the  particular 
specification  language  and  target  language.  These  rules,  which  have  constituted  a  major 
component  of  the  DEDALUS  effort,  form  conditional  statements  and  recursive  and 
nonrecursive  procedures;  they  also  generalize  procedures,  construct  well-founded  orderings 
to  guarantee  the  termination  of  recursive  calls,  and  write  code  that  simultaneously  achieves 
two  or  more  goals.  These  general  programming  principles  are  described  in  detail  in  a 
subsequent  section,  with  examples  illustrating  their  application.  As  pointed  out  in  the 
STATUS  section,  some  of  the  principles  are  fairly  well  understood,  while  others  require 
further  study.  Not  all  the  principles  are  implemented  in  the  current  DEDALUS  system. 

The  DEDALUS  specification  language  can  contain  constructs  that  are  close  to  how  the 
user  actually  thinks  about  the  problem.  Thus,  the  DEDLAUS  specification  of  the  program 
lessall(x  I),  which  tests  whether  a  number  x  is  less  than  every  element  of  a  list  I  of  numbers, 
and  the  program  gcd(x  y).  which  computes  the  greatest  common  divisor  of  two  nonnegative 
Integers  x  and  y,  are  specified  as  follows: 

lessalKx  l)  <■  compute  x<att(» 

where  x  is  a  number  and  I  is  a  list  of  numbers, 
gcd(x  y)  <•  compute  "»•*  end  z|y) 

where  x  and  y  are  nonnegative  nonzero  integers  . 


The  all  construct  in  P(all  (I)),  Indicating  that  the  condition  P  holds  for  all  elements  of 
the  list  I,  and  the  set  constructor  {u:  P(u)),  indicating  the  set  of  elements  for  which  P  is  true, 
are  constructs  that,  through  the  repeated  application  of  transformation  rules  will  eventually 
be  converted  Into  target  language  code  that,  for  the  particular  program,  is  logically 
equivalent  to  the  original  specification.  The  specification  language  Is  not  fixed:  New 
constructs  can  be  Introduced  by  modifying  or  adding  transformation  rules. 

The  operation  of  DEDALUS  consists  of  the  repeated  application  of  transformations  to 
expressions  in  order  to  produce  expressions  that  are  closer  to,  or  within,  the  target 
language  In  DEDALUS,  the  expressions  that  occur  during  the  transformation  process  specify 
not  only  programs;  they  can  also  specify  conditions  to  be  proved,  as  well  as  conditions  to  be 
made  true.  All  these  expressions  are  treated  as  goals  to  be  achieved:  For  an  expression 
that  specifies  a  program,  the  goal  la  to  convert  that  program  Into  a  target  language 
Implementation;  for  an  expression  that  Is  a  condition  to  be  proved,  the  goal  Is  to  convert  It  to 
the  logical  constant  true;  for  an  expression  that  Ls  a  condition  to  be  made  true,  the  goal  Is  to 
construct  a  program  that  win  make  that  condition  true. 

Transforming  a  subexpression  (of  en  expression)  Into  another  subexpression  requires 
rules  of  the  form 
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t  •>  t‘  if  P  , 

the  condition  P  being  optional.  This  rule  indicates  that  the  subexpression  t  can  be  replaced 
by  t'.  If  P  is  present,  then  the  rule  can  only  be  applied  provided  that  the  system  first  prove 
that  P  is  true;  which  is  to  say,  before  the  rule  can  be  applied,  the  system  must  succeed  in 
achieving  the  subgoal 

Goal:  prove  P  . 

For  example,  consider 

P(all(l))  O  P(head(l))  and  P(all(tail(l)))  if  not  empty  (I)  , 

which  expresses  the  fact  that  a  property  P  holds  for  every  element  of  a  nonempty  list  I  if  it 
holds  for  the  first  element  head(l)  and  for  every  element  of  the  list  tail  (I)  of  the  other 
elements  Before  the  system  can  apply  this  rule  to  some  part  of  an  expression,  it  would 
have  to  succeed  In  proving  that  I  is  not  empty. 

The  application  of  transformation  rules  results  In  a  tree  of  goals  and  subgoals.  Initially 
the  top-level  goals  of  this  tree  are  established  by  program  specifications.  Thus,  the  common 
form  of  program  specification 

f(x)  <»  compute  P(x) 

where  Q(x)  , 

establishes  its  output  description  as  the  top-level  goal 
Goalxompute  P(x)  , 

and  in  trying  to  achieve  this  goal,  the  system  assumes  the  truth  of  Q(x).  If  the  top-level 
goals  of  trees  are  established  by  program  specifications,  most  goals  are  established  as  the 
result  of  transformations.  Thus,  by  applying  the  transformation  ruia 

u|v  and  u|w  ■>  u|v  and  u|w*v 

to  the  top-level  goal  of  the  gcd  program 

Goal  1 :  compute  max(z:z|x  and  z|y), 

the  system  establishes 

Goal  2:  compute  max  (z:z|x  and  z|y-x) 

as  a  subgoal.  Such  transformations  express  knowledge  about  specific  constructs.  In  the 
DEDALUS  system  there  Is  also  knowledge  of  a  more  general  sort. 


General  Programming  Principles 


This  section  describes  five  genersl  progressing  principles  and  presents  several 
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examples  to  illustrate  their  application.  The  principles  express  knowledge  about  how  to  form 
conditionals  and  procedures  (recursive  and  nonrecursive),  how  to  replace  two  or  more 
procedures  by  a  generalized  procedure,  and  how  to  achieve  simultaneous  goals.  As  explained 
In  the  STATUS  section,  the  current  implementation  of  OEOALUS  does  not  incorporate  the 
generalization  of  procedures  or  the  achievement  of  simultaneous  goals. 

Conditional  formation  Many  of  the  transformation  rules  Impose  some  condition  P  (e  g  .  I 
is  nonempty,  x  is  nonnegative)  that  must  be  satisfied  for  the  rule  to  be  applied.  Suppose 
that  In  ottemptmg  to  apply  a  particulor  rule,  the  system  failed  to  prove  or  disprove  the 
condition  P.  where  P  Is  expressed  entirely  In  terms  of  the  primitive  constructs  of  the  target 
language;  in  such  a  situation,  the  conditional  formation  rule  is  invoked.  This  rule  allows  the 
Introduction  of  case  analysis  to  consider  separately  the  cases  in  which  P  is  true  and  in  which 
P  is  false  Suppose  the  result  is  both  a  program  segment  SI  that  achieves  the  goal  under 
the  assumption  that  P  Is  truo  and  another  program  segment  S 2  that  achieves  the  goal  under 
the  assumption  that  P  Is  false  The  conditional  formation  principle  puts  these  two  program 
segments  together  Into  a  conditional  expression 

If  P  then  SI  else  S 2  , 

which  solves  the  problem  regardless  of  whether  P  is  true  or  false.  During  the  generation  of 
S2,  the  system  could  discover  that  a  conditional  expression  was  unnecessary:  The 
generation  of  S2  may  not  have  required  the  assumption  that  P  was  false.  In  such  a  case,  the 
program  constructed  would  be  simply  S2. 

Recursion  formation.  Suppose,  in  constructing  a  program  with  specifications 

f(x)  <»  compute  P(x) 

where  Q(x)  , 


the  system  encounters  a  subgoal 
compute  P(t)  . 

which  is  an  instance  of  the  output  specification,  compute  P(x).  Because  the  program  f(x)  is 
Intended  to  compute  P(x)  for  any  x  satisfying  Its  input  specification  Q(x).  the  recursion 
formation  rule  proposes  achieving  the  subgoal  by  computing  P(t)  with  a  recursive  call  f(t). 
Tor  this  step  to  be  valid,  it  must  ensure  that  the  Input  condition  Oft)  holds  when  the  proposed 
recursive  call  is  executed  To  ensure  that  the  new  recursive  call  will  not  cause  the  program 
to  loop  indefinitely,  the  rule  must  also  establish  a  termination  condition,  showing  that  the 
argument  t  Is  strictly  less  than  the  Input  x  in  some  well-founded  ordering.  (A  well-founded 
ordering  is  an  ordering  In  which  no  Infinite  strictly  decreasing  sequences  can  exist.)  This 
condition  precludes  the  possibility  that  an  Infinite  sequence  of  recursive  calls  occur  during 
the  execution  of  the  program. 

Example:  lessall.  The  DEDALUS  system  derived  the  program  lessall(x  I).  which  tests 
whether  a  given  number  x  is  less  than  every  element  of  a  give  list  I  of  numbers.  The 
specifications  for  this  program  are 

lessallfx  I)  <■  compute  x  <  ail  (I) 

where  x  is  a  number  and  I  Is  a  list  oi  numbers  . 
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In  deriving  this  program,  the  system  develops  a  subgoal 
compute  x  <  all(tai!(D)  , 

in  the  case  that  I  Is  nonempty.  This  subgoal  is  an  Instance  of  the  output  specification  of  the 
original  specification,  with  the  Input  I  replaced  by  tail(l);  therefore,  the  recursion  formation 
principle  proposes  that  the  subgoal  be  achieved  by  introducing  a  recursive  call  lessall(x 
tail(D)  To  ensure  that  this  step  is  valid,  the  rule  establishes  an  input  condition  that 

x  is  a  number  and  tail(l)  is  a  list  of  numbers  , 

and  a  termination  condition  that  the  argument  pair  (x  tail(D)  is  less  than  the  input  pair  (x  I)  in 
some  well-founded  ordering.  This  termination  condition  holds  because  tail(l)  is  a  proper 
sublist  of  I. 

As  the  final  program  the  system  obtains 

lessall(x  I)  <*  if  empty(l)  then  true 

else  x  <  head  (I)  and  iessail  (x  taii(D)  . 


Procedure  formation  Suppose  that  while  developing  a  tree  tor  a  specification  of  the 

form 


f(x)  <*  compute  P(x) 
where  Q(x)  , 

the  system  encounters  a  subgoal 
Goal  B  compute  R(t)  , 

which  is  an  instance  not  of  the  output  specification  compute  P(x)  but  of  some  previously 
generated  subgoal 

Goal  A:  compute  R(x)  . 

Then  the  procedure  formation  principle  introduces  a  new  procedure,  g(x),  whose  output 
specification  is 

g(x)  <■  compute  R(x)  . 

In  this  way,  both  Goals  A  and  B  can  be  achieved  by  calls  g(x)  and  g(t)  to  a  single  procedure. 
In  the  case  where  Goal  B  has  been  derived  from  Goal  A,  the  call  to  g<t)  will  be  e  recursive 
call;  otherwise,  both  calls  will  be  simple  procedure  calls. 

Ex  -mpie:  cart.  The  specification  of  the  program  cart(s  t)  to  compute  the  Certesien 
product  of  two  sets,  s  and  t,  Is 


cart(s  t)  <•  compute  ((x  y)  ;  x«s  and  yst) 
where  s  and  t  are  finite  sets  . 
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While  deriving  the  tree  for  the  program,  the  system  obtains  a  subgoal 

Goal  A:  compute  ((x  y)  :  x*he«d(s)  and  y»t)  , 

given  that  s  is  nonempty.  Developing  Goal  A  further,  the  system  derives 

Goal  B:  compute  ((x  y)  :  x=head(s)  and  yxtail(t))  , 

given  that  t  is  nonempty  Goal  B  Is  an  Instance  of  Goal  A;  therefore,  the  procedure  formation 
rule  proposes  introducing  a  new  procedure  carthead  (s  t)  whose  output  specification  is 

carthead(s  t)  <*  compute  (( x  y)  :  x*head(s)  and  y«t) 

so  that  Goal  A  can  be  achieved  with  a  procedure  call  carthead(s  t),  and  Goal  B,  with  a 
(recursive)  call  carthead(s  tall(t)). 

Constructing  the  carthead  procedure  by  the  techniques  already  described,  the  final 
system  of  programs  becomes, 

cart(s  t)  <«  If  empty(s)  then  () 

else  union(  carthead(s  t)  cart(tall(s)  0)  , 


carthead(s  t)  <*  If  ompty(t)  then  () 

else  umon(  ((head(s)  head(t))) 
cartheadfa  tatl(t)))  . 


Generalization.  Suppose,  in  deriving  a  program,  that  we  obtain  two  subgoals 
Goal  A:  computa  R(a(x)) 

Goal  B  compute  R(b(x))  , 

neither  of  which  is  an  instance  of  the  other,  but  both  of  which  are  Instances  of  the  more 
general  expression 

compute  R(y)  . 

In  such  a  case  the  extended  procedure  formation  rule  proposes  the  Introduction  of  the  new 
procedure,  whose  output  specification  la 

g(y )  <*  compute  R(y)  . 

Thus,  Goal  A  and  Goal  B  can  be  achieved  by  procedure  calls  to  g(a(x))  and  g(b(x)), 
respectively. 

Example:  reverse.  In  constructing  a  program  reverse  (I),  to  reverse  a  Mst  I,  we  first 
derive  two  subgoals: 

Goal  A:  compute  append(reverse(taU(D) 
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cons(headO)nll)) 


Goal  B:  computa  append(reverse(tail  (tail(l))) 
cons(head(taii(D) 

cons(headd)  ml)))  . 


skip 

Each  is  an  instanca  ot  the  more  general  expression 


compute  append(reverse(tail(D) 
cons(head(l)  m))  ; 

therefore,  the  extended  procedure  formation  rule  proposes  Introducing  a  new  procedure 
reversegend  m),  whose  output  specification  is  the  more  general  expression: 

reversegend  m)  <*  compute  appond(reverse(tall(D) 

cons(headd)  m))  . 


Although  this  procedure,  which  reverses  a  nonempty  list  I  and  appends  the  result  to  m,  is  a 
more  general  problem  than  the  original  reverie  program,  it  turns  out  that  reversegen  is 
actually  easier  to  construct  The  final  system  of  programs  obtained  Is 

reverse(l)  <*  if  empty(l)  then  nil 
else  reversegend  nil) 

reversegend  m)  <■  if  empty(tailO))  then  cons(hesdO)  m) 
else  reversegen(taild)  cons(headd)  m))  . 


Simultaneous  goals  In  order  to  deal  with  operations  that  produce  side-effects  such 
as  modifying  the  structure  of  data  objects  (e  g  ,  assignment  statements),  DEOALUS 
introduces  constructs  such  as  achieve  P,  to  denote  a  program  Intended  to  make  the 
condition  P  truo 

In  constructing  a  program  to  achieve  two  conditions,  PI  and  P2,  It  is  not  sufficient  to 
decompose  the  problem  by  constructing  twe  “dependent  programs  to  achieve  PI  and  P2, 
respectively.  The  concatenation  of  the  two  programs  might  not  achieve  both  conditions 
because  the  program  that  achieves  P2  may  in  the  process  make  Pt  false,  and  vice  versa. 

For  example,  suppose  a  program  is  desired  to  sort  the  values  of  three  variables  x,  y, 
and  z,  in  other  words,  to  permute  the  values  of  the  variables  to  achieve  the  two  conditions 
x;y  and  ytz  simultaneously  Assume  the  given  primitive  instruction  sort?(u  v),  which  sorts 
the  values  of  its  input  variables  u  and  v.  The  concatenation 

sort2(x  y) 

Sort2(y  /) 

of  these  two  segments  will  not  achieve  both  conditions  simultaneously;  the  second  segment 
sort2(y  z)  may,  by  sorting  y  snd  z.  make  the  first  condition  wy  fslso. 
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The  simultaneous  goal  principle,  which  was  introduced  to  circumvent  such  difficulties, 
states  that  to  satisfy  a  goal  of  form 

achieve  PI  and  P2  , 


first  construct  a  program  F  to  achieve  PI.  then  modify  F  to  achieve  P 2  while  protecting  PI  at 
the  end  of  F  A  special  'protection  mechanism*  (cf.  (Sussman,  19/6))  ensures  that  no 
modification  Is  permitted  that  destroys  the  truth  of  the  protected  condition  PI  at  the  end  of 
the  program 

Example:  sort  To  apply  this  principle  to  the  goal 
achieve  x  <  y  and  y  <  z 

in  the  sorting  problem,  a  system  would  first  achieve  x  <  y.  by  using  the  segment  sort  ?(x 
y)  This  program  would  then  be  modified  to  achieve  the  second  condition  y  s  z.  But  adding 
sort2(y  z)  at  the  end  of  the  program  wdl  not  work  because  It  destroys  the  truth  of  the 
protected  condition  x  s  y 

However,  in  general,  a  goal  may  be  achieved  by  Inserting  modifications  at  any  point  in 
the  program,  not  merely  at  the  end.  Introducing  the  two  Instructions 

If  y  <  x  then  sort2(x  y) 

If  x  y  then  sort2(y  z) 

at  the  beginning  of  tho  program  segment  would  simultaneously  achieve  both  conditions  x  y 
and  y  z  The  resulting  program  would  be 

If  y  <  x  then  sort2(x  z) 

If  x  <  y  then  sort2(y  z) 
sort2(x  y)  . 


Status 

Currently,  the  DEDAIUS  implementation  Incorporates  the  principles  of  conditional 
formation,  recursion  formation  (including  the  termination  proofs),  end  procedure  formation,  but 
It  does  not  Include  generalization  or  the  formation  of  structure-changing  programs  The 
techniques  for  deriving  straight-Une  structure-changing  programs  were  implemented  In  a 
separate  system  (see  Waldinger,  10//). 

Conditional  formation  and  recursion  formation  are  well  understood  The  method  for 
proving  termination  of  ordinary  recursive  calls  does  not  always  extend  to  the  multiple- 
procedure  case  The  generalization  mechanism  and  the  extended  procedure  formation 
principle  are  just  beginning  to  be  formulated. 


The  derivation  of  straight-line  prograsis  with  simple  side-effects  is  fairly  well 
understood,  but  much  work  needs  to  be  done  on  the  derivation  of  structure-changing 
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programs  with  conditional  expressions  and  loops,  as  well  as  on  the  derivation  of  programs 
that  alter  list  structures  and  other  complex  data  objects. 

The  DEDAIUS  system  is  implemented  in  01ISP  (Wilber,  10/6),  an  extension  of 
INTERLISP  (Teitetman  et  at.,  10/8)  that  includes  pattern-matching  and  backtracking  facilities. 
The  full  power  of  the  QtISP  language  is  available  in  expressing  each  rule  since  the  rules  are 
represented  as  QtISP  programs  in  a  fairly  direct  manner. 

To  date,  these  are  some  of  the  representative  samples  of  the  programs  constructed  by 
the  current  DEDAIUS  system 

Numerical  Programs) 

•  the  subtractive  gcd  algorithm, 

*  the  Euclidean  gcd  algorithm, 

-  the  binary  gcd  algorithm,  and 

-  the  remainder  of  dividing  two  integers. 

List  Programs'. 

-  finding  the  maximum  element  of  a  list, 

-  testing  If  a  list  Is  sorted, 

-  testing  if  a  number  is  less  than  every  element  of  a  list 
of  numbers  (lessall)),  and 

-  testing  if  every  element  of  one  list  of  numbers  is  less 
than  every  element  of  another 

Set  Programs: 

-  computing  the  union  or  intersection  of  two  sets, 

-  testing  If  an  element  belongs  to  a  set, 

-  testing  if  one  set  is  a  subset  of  another,  and 

-  computing  the  cartesian  product  of  two  sets  (cart). 


References 

See  Bal/er  (19/?),  Balrer,  Goldman.  &  Wile  (19//b),  Boyer  A  Moore  (19/6).  Buchanan 
A  Luckham  (19/4),  BurstaN  A  Darlington  (19//).  Dijkatra  (19/6),  Dijkstra  (19/6),  Green 
(19/6b).  Guttag.  Horowitz,  A  Musser  (19/0),  Heidorn  (19/0),  Manna  A  Waldinger  (19/0), 
Sik lossy  (19/4),  Sussman  (19/6).  Teitelman  at  al.  (19/0),  Waldinger  (19//),  Werren 
(19/4),  Warren  (19/0),  end  WNber  (19/0). 


H 


PROTOSYSTEM  I 


63 


H.  PROTOSYSTEM  I 

PROTOSYS  TEM  I,  an  automatic  programming  system  designed  by  William  Martin.  Greqory 
Ruth,  Robert  Baron,  Matthew  Morgonstern,  and  others  of  thu  MIT  laboratory  tor  Computor 
Science.  Is  part  of  a  larger  research  project  aimed  at  modeling,  understanding,  and 
automating  the  writing  of  a  data-proccssing  system  Hereafter  the  data-processmg  system 
is  referred  to  as  a  data  ptKtittng  program,  in  accord  with  this  chapter's  terminology,  which 
refers  to  the  output  of  an  automatic  programming  system  as  a  program  A  model  of  the  larger 
research  project  was  developed  that  consists  of  five  phases  The  successive  phases  can 
be  viewed  as  a  series  of  transformations  of  the  descriptions  of  the  target  program,  beginning 
with  a  global  conceptual  description  of  the  problem  at  hand  and  progressing,  through 
increasing  specificity,  toward  a  detailed  machine-level  solution  The  aim  of  the  project  is  to 
develop  stages  of  an  automatic  programming  system  where  each  corresponds  to  one  of  the 
five  phases  of  the  model  and  each  embodies  the  particular  knowledge  and  expertise  for  that 
phase 

Phase  1:  Problem  Definition- - The  specification  of  the  data-processmg  program  Is 
expressed  in  domain-dependent  terms  in  English. 

Phase  2:  Specification  Analysis  and  System  f  ormulatton--The  specification  in  Phase  1 
is  viewed  as  a  data-processmg  problem.  This  problem  is  solved,  yielding  a  data-processing 
formulation  of  the  desired  program 

Phase  3t  implementation--The  procedural  steps,  data  representation,  and  organization 
of  the  target  are  determined  by  intelligent  selection  from,  and  adaptation  of.  a  set  of 
standard  implementation  possibilities 

Phase  4:  Code  Generation--The  Implementation  of  Phase  3  is  transformed  into  code  in 
some  high-level  language  (e  g  ,  PL/I). 

Phase  6:  Compilation  and  Loadmg--The  high-level  code  is  transformed  into  a  form  that 
can  be  "understood"  and  executed  by  the  target  computer. 

The  first  two  phases  involve  such  Al  areas  as  natural  language  comprehension,  program 
model  formation,  and  problem  solving  Since  these  areas  are  still  In  the  process  of  evolution, 
the  development  of  the  first  two  phases  has  been  deferred  At  present,  PROTOSYSTEM  Is 
limited  to  the  automation  of  phases  3  and  4  since  it  was  felt  that  these  phases  were  much 
more  amenable  to  solution  Thus,  the  current  PROTOSYSTEM  accepts  a  specification  In  terms 
of  abstract  relations  (in  a  very  high-level  language  called  SSL),  and  then  designs  an 
optimized  data-processmg  proqram  and  generates  code  for  an  efficient  implementation.  In 
automatic  programming  It  is  usually  Impossible  for  a  system  to  carry  out  a  search  for  the 
absolutely  optimal  implementation;  instead,  a  ayatem  works  at  optimizing  a  program  only  to  e 
degree 

The  particular  problem  area  of  PROTOSYSTEM  1  is  that  of  I/O  Intensive  (file  manipulation 
and  updating),  batch-oriented,  data-processing  programs.  Included  in  this  area  are  programs 
for  Inventory  control,  payroll,  and  other  record-keeping  systems 

The  specification  method  uses  a  description  of  the  desired  data-processing  program  in 
the  SSL  language  An  SSL  specification  consists  of  a  data  and  a  computation  division.  The 
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data  division  gives  the  names  of  data  sets  (conceptual  aggregations  or  groupings  of  data), 
their  keys,  and  their  ponod  of  updating  the  computation  division  specifies  for  each 
computed  file  the  calculations  to  be  performed  when  it  is  computed  figure  1  illustrates  an 
SSL  specification  of  a  data*processmg  program  for  a  warehouse  inventory.  In  the  proposed 
problem,  the  warehouse  stocks  a  number  of  different  kinds  of  items  that  are  sent  out  daily  to 
various  stores.  The  data-processing  program's  task  is  to  keep  track  of  inventory  levels, 
which  Items  and  how  many  of  each  Item  should  be  reordered  from  the  producer  (an  item  is 
reordered  when  less  than  100  remain  in  stock),  and  how  many  items  are  received  from  the 
producer.  In  the  data  division  are  data  sets  (e  g  ,  shipments-received,  beginning-inventory, 
total-items,  etc  ),  and  In  the  computation  division  are  the  computation  steps  that  involve 
these  data  sets  (e  g  ,  for  each  Item,  the  beginning  inventory  Is  computed  by  adding  the 
shipments  received  to  the  final  Inventory  from  the  previous  day) 

After  receiving  the  SSL  specification  of  the  desired  program,  PROTOSYSTEM  transforms 
it  into  an  efficient  target  language  implementation  consisting  of  a  collection  of  PL/I  programs 
and  its  JCL  ("Job  Control  language")  for  the  IBM  360  system  To  accomplish  this 
transformation,  the  following  specific  design  decisions  are  made  with  the  goal  of  achieving  an 
efficient  implementation: 

(a)  Design  each  keyed  file,  deciding  what  are  to  be  its  data  items,  organization 

(consecutive,  index  sequential,  regional),  storage  device,  associated  sort 
ordering,  and  number  of  records  per  Mom, 

(b)  design  each  job  step,  determining  which  computations  the  step  is  to  include, 

its  accessing  method  (sequential,  random,  core  table),  its  driving  data 
set(s),  and  the  order  (by  key  values)  in  which  the  records  of  its  input  data 
sets  are  to  be  processed. 

(c)  determine  whether  sorts  are  necessary  and  where  they  should  be  performed;. 

and 

(d)  determine  the  sequence  of  job  steps 

Generally,  these  design  decisions,  especially  the  central  ones  of  determining  the  final 
target  data  sets,  computation  steps,  and  sequencing  of  computation  steps,  are  made  by 
exploring  the  different  ways  of  combining  data  sets  and  computation  steps.  The  system 
carries  out  these  explorations  with  the  goal  of  minimizing  the  number  of  file  accesses  made 
during  the  run-time  of  the  target  implementation  Sometimes,  as  explained  below,  the  system 
also  will  seek  to  minimize  a  more  detailed  cost  estimate  of  the  target  implementation. 

Described  In  greater  detail  in  the  next  section,  the  method  employed  by  PROTOSYSTEM 
for  achieving  an  efficient  implementation  does  not  rely  solely  on  heuristics  but  Instead  uses 
what  is  essentially  a  dynamic  programming  algorithm  with  heuristics  added  to  the  algorithm, 
so  that  II  can  finish  In  a  reasonable  amount  of  time  An  advantage  of  dynamic  programming  Is 
that  It  can  provide  a  good  handle  on  global  optimization  when  the  results  of  Individual 
decisions  have  far-reaching  and  compounding  effecta  throughout  the  design  of  the  date¬ 
processing  program. 
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Operation 

Although  the  actual  optimization  process  Is  performed  by  the  optimizer  module,  several 
other  modules  provide  preparatory  and  support  services.  First,  the  structural  analyzer  module 
generates  predicates  for  the  operations  In  the  SSI  computation  division.  These  predicates 
Indicate  the  conditions  under  which  data  Items  In  a  data  set  will  be  either  accessed  or 
generated  during  an  operation.  For  example,  the  condition 

(DEFINED  A  (kl))  ■  (OR  (DEFINED  B  (kl))  (DEFINED  C  (kl))) 

would  indicate  that  there  Is  a  record  In  data  set  A  for  a  value  of  the  key.  kl,  only  when  at 
least  one  of  the  data  sets  B  or  C  has  a  record  for  that  value  of  the  key.  The  structural 
analyzer  also  produces  candidate  driving  data  sets  for  each  operation  in  the  computation 
division  A  driving  data  set  of  an  operation  is  a  data  set  whose  records  are  "walked  through" 
once  in  order  of  their  occurrence--!. a..  the  operation  is  executed  once  at  each  step 
(record)--to  drive  the  operation. 

Tho  predicates  produced  by  the  structural  ana  /er  are  then  used  by  the  question¬ 
answering  module  to  provide  information  to  the  optimizer  about  the  averago  number  of  I/O 
accesses  implied  by  tentative  configurations  (i.e.  tentative  choices  for  the  data  sets  and 
computation  steps)  of  the  target  implementation  The  question-answering  module  maintains  a 
knowledge  base  consisting  of  the  predicates,  characteristics  of  the  data,  as  well  as 
Information  obtained  from  interaction  with  the  user,  such  as  average  data  set  size  or  the 
probability  of  a  predicate  fragment  being  true.  This  knowledge,  along  with  knowledge  about 
the  probability  calculus,  is  used  to  answer  questions  about  the  size  of  a  data  set  and  about 
the  average  number  of  items  in  the  data  set  that  are  likely  to  satisfy  a  certain  predicate 
(e.g.,  an  access  predicate).  When  the  knowledge  is  insufficient  to  answer  an  optimizer 
question,  the  question  answerer  Initiates  a  dialogue  with  the  user  in  order  to  elicit  enough 

additional  Information  to  proceed, 
v 

The  optimization  process  itself  is  performed  by  the  optimizer  module  This  module 
Intermittently  obtains  Information  from  the  question  answerer  about  I/O  accesses  of 
tentative  configurations  of  parts  of  the  data-processing  program,  in  order  to  explore  the 
effects  of  such  design  parameters  as  the  number  of  records  per  block,  the  file  organization, 
the  data  items  that  are  collected  Into  a  single  data  set.  and  the  computations  that  are 
performed  during  a  single  reading  of  a  file  or  files.  Since  the  problem  area  of  PROTOSYSTEM 
is  that  of  I/O  intensive  programs,  the  optimizer  explores  the  venous  design  parameters  with 
the  goal  of  minimizing  the  number  of  file  accesses  of  the  target  language  implementation  (of 
the  data-processing  program).  Sometimes,  however,  after  a  number  of  more  Important  design 
decisions  have  been  made,  the  optimizer  will  explore  design  decisions  by  computing  a  more 
detailed  cost  estimate  that  attempts  to  approximate  the  charging  structure  of  the  particular 
installation  on  which  the  target  system  Is  to  run  (e  g.,  disc  space,  core  residency  charges, 
explicit  I/O,  etc  ). 

The  central  part  of  the  optimization  process  is  concerned  with  the  the  exploration  of 
various  ways  of  setting  up  data  sets  and  computation  steps.  Basically,  the  optimization 
module  starts  with  the  data  sets  and  computation  steps  In  the  data  division  and  computation 
division  of  tho  SSL  specification.  Than,  with  the  goal  of  minimizing  the  number  of  file 
accesses,  the  module  looks  at  data-processing  programs  that  use  various  aggregations  of 
these  Initial  data  sots  and  computation  stops  (an  aggregation  of  two  or  store  data  seta  is  a 


1 


66 


Automatic  Programming 


data  set  that  has  all  the  data  Items  of  the  original  data  sets,  while  an  aggregation  of  several 
computation  steps  Is  a  computation  step  that  performs  the  functions  of  the  original  steps). 
The  optimizer  explores  aggregating  data  sets  and  aggregating  computation  steps  and 
develops  and  utilizes  constraints  on  the  sort  order  of  both  data  sets  and  computation  steps 
(an  example  of  a  sort  order  constraint  on  a  data  set  would  be  when  the  data  set  should  have 
its  records  sorted  on  a  particular  key  first). 

To  avoid  the  problem  of  combinatorial  explosion,  the  module  uses  a  form  of  dynamic 
programming  with  heuristics.  Loosely  speaking,  one  may  say  that  dynamic  programming  is  a 
set  of  parameterized  recursive  equations,  which,  in  this  case,  express  the  cost  of  optimized 
longer  segments  of  the  program  In  terms  of  optimized  shorter  segments.  A  pure  dynamic 
programming  algorithm,  though  It  would  find  the  absolute  optimum  target  implementation,  would 
require  an  extreme  amount  of  time  to  do  so.  Therefore,  in  order  that  the  algorithm  finish  in  a 
reasonable  time,  a  number  of  heuristics  have  been  employed  in  the  algorithm,  including 
decoupling  decisions  where  possible  (and  sometimes  even  where  It  is  not  completely 
possible)  and  carrying  out  local  optimizations  before  making  adjustments  for  global  concerns. 
A  full  explanation  of  the  algorithm  Is  found  in  Morgenstern  (1876). 


Status 

The  SSL  specification  language  has  been  completely  defined  and  there  is  an 
operational  Implementation  of  PROTOSYSTEM  in  MACLISP  on  the  MIT  LCS  PDP-10  The  system 
is  capable  of  producing  acceptable  target  language  implementations  From  a  larger 
perspective,  the  PROTOSYSTFM  I  project  has  developed  a  5-phase  model  of  the  process  of 
writing  a  data-processmg  program  (system),  from  its  conception  to  its  implementation  as 
executable  code  Twenty  years  ago.  the  fifth  phase,  compilation  and  loading,  was 
automated  At  present,  a  preliminary  theory  and  automation  of  the  third  and  fourth  phases, 
the  generation  of  the  system  and  translation  into  high-level  code,  are  embodied  in 
PROTOSYSTEM  I.  It  is  felt  that  within  the  next  decade  the  theory  and  automation  of  the 
remaining  two  phases.  Including  problem  definition,  specification  analysis,  and  system 
formulation,  should  easily  fall  within  the  realm  of  presently  developing  Al  technologies. 

DATA  DIVISION 


FILE  SHIPMENTS -RE CEIVEO 
KEY  IS  ITEM 
GENERATED  EVERY  DAY 
FILE  BEGINNING-INVENTORY 
KFY  IS  ITEM 
GENERATED  EVERY  DAY 
FILE  TOTAL -ITEM-ORDERS 
KEY  IS  ITEM 
GENERATED  EVERY  DAY 
FILE  C  JANTITY-SHIPPED-TO-STORE 
KEY  IS  ITEM.  STORE 
GENERATED  EVERY  DAY 


FILE  OUANTITY-ORDERED-BY-STORE 
KEY  IS  ITEM 
GENERATED  EVERY  DAY 
FILE  TOTAL-SHIPPED 
KEY  IS  ITEM.  STORE 
GENERATED  EVERY  DAY 
FILE  FINAL -INVENTORY 
KEY  IS  ITEM 
GENERATED  EVERY  DAY 
FILE  REORDER-AMOUNT 
KEY  IS  ITEM 
GENERATED  EVERY  DAY 
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COMPUTATION  DIVISION 

BEGINNING-INVENTORY  IS 

FINAL-INVENTORY  (from  the  previous  day)  ♦  SHIPMENTS -RECEIVED 
TOTAL-ITEM-ORDERS  IS  SUM  OF  QUANTITY -ORDEREO-BY-STORE  FOR  EACH  ITEM 
QUANTITY-SHIPPEO-TO-STORE  IS 

QUANTITY-OROE RED-BY-STORE  IF  BEGINNING-INVENTORY  IS 

GREATER  THEN  TOTAL-ITEM-ORDERS 

ELSE 

OUANTITY-ORDERED-BY-STORE 
•  (BEGINNING-INVENTORY  /  TOTAL -ITEM-ORDERS) 

IF  BEGINNING-INVENTORY  IS  NOT 
GREATER  THEN  TOTAL-ITEM-OROERS 

TOTAL-SHIPPED  IS  SUM  Of  QUANTITY-SHIPPED-TO-STORE  FOR  EACH  ITEM 
FINAL -INVENTORY  IS  BEGINNING-INVENTORY  •  TOTAL-SHIPPED 
REORDER-AMOUNT  IS  1000  IF  FINAL-INVENTORY  IS  LESS  THAN  100. 


Figure  1:  SSL  relational  description  for  a  data  processing  program. 


References 

See  Baron  (1077),  Morgenstern  (1076),  Ruth  (1076s),  Ruth  (107B),  end  Ruth  (1070). 
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I.  NLPQ:  Natural  Language  Programming  for  Queuing  Simulations 

The  Natural  Language  Programming  for  Queuing  Simulations  (NLPQ)  project  was  begun 
by  George  Heidorn  at  Yale  University  In  1967  as  a  doctoral  dissertation  and  completed  at  the 
Naval  Postgraduate  School  during  the  years  1968-1972.  The  problem  area  Is  that  of 
simulation  programs  for  simple  queuing  problems  The  queuing  problem's  specification  occurs 
during  an  English  dialogue  In  which  the  user  and  the  NLPQ  system  each  can  furnish 
information  to,  and  request  information  from,  the  other.  From  this  dialogue,  the  NLPQ  system 
creates  and  maintains  a  partial  internal  description  of  the  queuing  problem.  This  partial 
description  is  used  to  answer  any  questions  that  the  user  may  ask;  it  is  used  to  generate 
questions  that  aro  to  be  asked  of  the  user;  and  when  eventually  completed  by  the  dialogue 
activity,  it  is  used  to  generate  the  implementation  of  the  simulation  program  in  the  target 
language  GPSS  The  system's  processing  ••  including  creating  the  problem  description  and 
generating  the  GPSS  program,  as  well  as  translating  and  generating  sentences*-is  specified 
by  production  rules. 


Specification 

in  the  English  dialogue  that  constitutes  NIPQ's  method  of  specification,  the  user  can 
make  statements,  give  commands,  ask  questions,  and  answer  questions  During  this  process 
the  system  can  ask  and  answer  questions  and  respond  to  commands  In  the  very  brief 
dialogue  of  figure  1.  most  of  the  features  of  the  specification  method  are  illustrated  in  a 
simple  way  (other  more  complex  dialogues  are  presented  In  Heidorn,  1072,  Heidorn,  1974, 
Heidorn,  1075b,  and  Heidorn,  1976)  In  the  dialogue,  line  numbers  have  been  added  for 
purposes  of  reference;  and  the  lower  case  typing  was  done  by  the  user,  while  the  UPPER 
CASE  typing  was  done  by  the  computer 

The  particular  specification  concerns  a  queuing  problem  about  cars  that  arrive  at  a 
station,  get  serviced,  and  leave  The  user  initially  volunteered  some  information  about  how 
often  the  cars  arrive  and  about  the  sequencing  of  actions,  in  lines  1  and  2.  and  then  asked 
the  system  what  other  Information  was  needed  to  complete  the  queuing  model,  in  line  3.  In 
lines  4  through  9.  the  system  asked  questions  about  times  and  actions,  which  the  user 
answered,  until  the  problem  statement  was  complete,  in  line  10  Then  in  line  11  the  user 
asked  a  specific  question  about  the  queuing  model  concerning  the  mean  time  needed  to 
service  a  car,  a  question  that  the  system  answered  In  line  12  In  line  13  the  user  told  the 
system  to  produce  a  description  of  the  entire  problem  in  English  for  verification  purposes, 
which  It  did,  In  line  14  Finally,  in  response  to  the  user  command  in  line  15.  the  system 
produced  the  GPSS  program,  in  line  16,  complete  with  English  comments  (preceded  by  *)  and 
meaningful  symbolic  names  The  time  required  for  the  NLPQ  system  to  process  this  sample 
specification  and  to  produce  the  GPSS  program  was  8  seconds  of  virtual  CPU  time  on  an  IBM 

3  70/  1  68  running  L  ISP. 

/  ••  EVERY  6  MINUTES  A  CAR  ARRIVES  AT  A  STATION. 

:  ••  AFTER  ARRIVING.  IT  IS  SERVICED  AT  A  PUMP. 

1  ••  IS  T  IE  PROBLEM  STATEMENT  COMPLETE? 

4  flow  long  are  the  cars  serviced  at  the  pump? 

5  ••  FROM  3  TO  7  MINUTES. 

6  After  being  serviced  at  the  pump,  what  do  the  cars  do? 

7  ••  THEY  LEAVE  THE  STATION, 

8  How  long  shall  the  simulation  be  run? 
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10  HOURS;  USE  1  MINUTE  AS  THE  SASIC  TIME  UNIT. 

10  The  problem  statement  Is  complete. 

11  aa  WHAT  IS  THE  MEAN  OF  THE  TIME  TO  SERVICE  CARS? 

12  The  mean  of  the  uniform  distribution  is  6  minutes 
D  aa  DESCRIBE  THE  PROBLEM  IN  ENGLISH. 

14  The  cars  arrive  at  the  station  every  0  minutes. 

After  arriving  at  the  station,  the  cars  are  serviced  at  the 
pump.  The  time  for  the  cars  to  be  serviced  at  the  pump  is 
uniformly  distributed,  with  a  mean  of  6  minutes  and  a 
half-range  of  2  minutes  After  being  serviced  at  the  pump, 
the  cars  leave  the  station 

The  simulation  is  to  bo  run  for  10  hours,  using  s 
basic  time  unit  of  1  minute. 

If  aa  WRITE  THE  GPSS  PROGRAM. 

SIMULATE 

STAT1  EQU  1.F.Q 
PUMP?  EQU  2.F.Q 
CAR  1  EQU  1.T 
1  TABIE  Ml, 1,1,2 

■ 

•  THE  CARS  ARRIVE  AT  THE  STATION. 

GENERATE  6 

ASSIGN  1  .CAR  1 

■ 

•  THE  CARS  ARE  SERVICED  AT  THE  PUMP. 

ACT  1  QUEUE  PUMP2 

SEIZE  POMP  2 
DEPART  PUMP2 
ADVANCE  5.2 
RELEASE  PUMP2 

•  THE  CARS  LEAVE  THE  STATION. 

ACT 3  TABULATE  PI 

TERMINATE 

■ 

•  TIMING  LOOP 
GENERATE  600 
TERMINATE  1 
START  1 
END 


Figure  1 .  A  very  brief  NLPQ  dialogue. 


Operation 

The  processing  to  be  done  by  NLPQ  Is  specified  by  sets  of  production  rules  written  in  a 
language  designed  especially  for  this  system.  "Decoding”  rules  specify  how  strings  of 
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English  text  are  to  be  converted  into  records  m  a  semantic  net,  and  ‘‘encoding"  rules  specify 
how  records  are  to  be  converted  into  text.  These  rules  are  basically  phrase  structure 
grammar  rules  Natirol  Lenguega.BI,  but  they  are  augmented  with  arbitrary  conditions  and 
structure-building  actions  . 

The  representation  of  the  internal  description  of  the  simulation  problem  as  well  as  the 
representation  of  the  syntactic  and  semantic  structures  are  In  the  form  of  a  semantic 
network  Repreaentebon.B2.  A  network  consists  of  records  that  represent  such  things  as 
concepts,  words,  physical  entities,  and  probability  distributions.  Each  record  Is  a  list  of 
attribute-value  pairs,  where  the  value  of  an  attribute  is  usually  a  pointer  to  another  record 
but  may  sometimes  be  simply  a  number  or  character  string 

Prior  to  a  queuing  dialogue,  the  system  is  given  a  network  of  about  300  "named" 
records  containing  Information  about  words  and  concepts  relevant  to  simple  queuing 
problems.  Also,  it  Is  furnished  with  a  set  of  about  300  English  decoding  rules  and  600  English 
and  GPSS  encoding  rules.  As  the  dialogue  progresses,  the  system  uses  the  information  it 
obtains  from  the  English  dialogue  to  build  and  complete  a  partial  description  of  the  desired 
simulation,  a  description  that  is  In  the  form  of  a  network  called  the  Internal  Problem 
Description  (IPD). 

Basically,  an  IPD  network  describes  the  flow  of  mobile  entities,  such  as  vehicles, 
through  a  framework  consisting  of  stationary  entities,  such  as  pumps,  by  specifying  the 
actions  that  take  place  in  the  framework  and  their  interrelationships.  Each  action  is 
represented  by  a  record  whose  attributes  furnish  such  information  as  the  type  of  action,  the 
entity  doing  tho  action  (l.e.,  the  agent),  the  entity  that  is  the  object  of  the  action,  the 
location  where  It  happens,  Its  duration,  its  frequency  of  occurrence,  and  what  happens  next. 
For  example,  the  action  "The  men  unload  the  truck  at  a  dock  for  two  hours"  could  be 
represented  by  the  record: 


Rl: 


Type  unload 

Agent  eien 

Object  truck 

location  dock 
Duration  2  hours 


From  the  English  dialogue  the  NLPQ  system  must  obtain  all  the  information  needed  to 
build  tho  IPD  Thus,  the  user  must  describe  the  flow  of  mobile  entities  through  the  queuing 
model  by  making  statements  about  the  actions  that  take  place  and  about  the  relations 
between  these  actions  Each  mobile  entity  must  "arrive"  at  or  "enter"  the  model.  Then  It 
may  go  through  one  or  more  other  actions,  such  as  “service,"  "load,"  "unload."  and  "wait." 
Then,  typically,  It  "leaves"  the  model.  The  order  In  which  these  actions  take  place  must 
eventually  be  made  explicit  by  the  use  of  subordinate  clauses  beginning  with  such 
conjunctions  as  "after,"  "when."  and  "before,"  or  by  using  the  adverb  "then."  If  the  order  of 
the  actions  depends  on  the  state  of  the  queuing  model,  an  "If"  clause  may  be  used  to 
specify  the  condition  for  performing  an  action;  a  sentence  with  an  "otherwise"  In  it  is  used 
to  give  an  alternative  action  to  be  performed  when  this  condition  is  not  met. 

The  information  needed  to  simulate  the  problem.  Including  the  various  times  involved, 
must  also  be  furnished  by  the  English  dialogue.  It  is  necessary  to  specify  the  time  between 
arrivals,  the  time  required  to  perform  each  activity,  the  length  of  the  simulation  run,  and  the 
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basic  time  unit  to  be  used  in  the  GPSS  program.  Inter-event  and  activity  times  may  be  given 
as  constants  or  as  probability  distributions,  such  as  uniform,  exponential,  normal,  or  empirical. 
The  quantity  of  each  stationary  entity  should  also  be  specified,  unless  1  is  to  be  assumed 

The  user  may  either  furnish  this  information  in  the  form  of  a  complete  problem 
statement  or  state  some  part  of  it  and  then  let  the  system  ask  questions  to  obtain  the  rest 
of  the  information,  as  was  done  above  in  lines  1  through  10  of  Figure  1  The  latter  method 
results  In  a  scan  of  the  partially  built  IPO  for  missmg  or  erroneous  information  and  the 
generation  of  appropriate  questions.  Each  time  the  system  asks  a  question,  it  is  trying  to 
obtain  the  value  of  some  specific  attribute  that  will  be  needed  to  generate  a  GPSS  program. 
To  furnish  a  value  for  the  attribute,  the  question  may  be  answered  by  a  complete  sentence 
or  simply  by  a  phrase. 

The  user  may  ask  the  system  specific  questions  about  the  queuing  model,  and  then  the 
system  generates  the  answers  from  the  information  in  the  appropriate  parts  of  the  IPO.  In 
order  to  check  the  entire  IPD  as  It  exists  at  any  time,  the  user  may  request  that  an  English 
problem  description  be  produced  Such  a  description  consists  of  all  the  information  in  the  IPD 
as  it  is  converted  into  English  by  the  encoding  rules  (see  line  1  4  of  Figure  1)  Specifically, 
for  each  action  in  the  IPO,  the  system  generates  one  or  more  statements  describing  the  type 
of  action,  its  agent,  ob)ect,  location,  what  action  if  any  follows  (if  none,  a  new  paragraph  is 
started),  and.  If  applicable,  an  inter-event  time  or  duration  Conditional  successor  actions 
may  result  in  two  sentences,  with  the  first  one  having  an  "if"  clause  in  it  and  the  second  one 
beginning  with  "otherwise  "  After  all  of  the  actions  have  been  described,  a  separate  one- 
sentence  paragraph  is  produced  with  the  values  of  the  run  time  and  the  basic  time  unit. 

After  the  dialogue  is  finished  and  all  the  required  information  is  obtained,  Nl  PQ  uses  the 
IPO  and  the  GPSS  encodinq  rules  to  produce  the  desired  program  in  the  GPSS  target 
language  Such  a  program  was  listed  in  16  of  Figuro  1.  At  the  beginning  of  this  program,  the 
definitions  for  the  stationary  entitles,  mobile  entities,  and  distributions  are  given  Then,  for 
each  action,  a  comment  consisting  of  a  simple  English  action  sentence  is  produced,  followed 
by  the  GPSS  statements  appropriate  to  this  action  For  example,  an  "arrive”  usually 
produces  a  GENERATE  and  an  ASSIGN;  a  "leave"  produces  a  TABULATE  and  a  TERMINATE,  and 
most  activities  produce  a  sequence  like  QUEUE,  SEIZE,  DEPART,  ADVANCE,  and  RELEASE. 
These  are  usually  followed  by  some  sort  of  TRANSFER,  depending  upon  the  type  of  value  that 
the  action's  successor  attribute  has.  Finally,  the  GPSS  program  closes  with  a  "timing  loop" 
to  govern  the  length  of  the  simulation  run 


Status 

Though  this  project  was  "completed."  a  system  ready  for  production  use  was  not 
developed  The  Nl  PQ  prototype,  however,  was  demonstrated  several  times  on  a  variety  of 
problems  Although  the  capabilities  of  the  implemented  system  are  limited,  the  research  did 
establish  an  overall  framework  for  such  a  system,  and  useful  techniques  were  developed. 
Enough  details  were  worked  out  to  enable  the  system  to  carry  out  Interesting  interactions, 
as  evidenced  by  the  longer  more  complicated  dialogues  found  In  the  first  four  references  at 
the  end  of  this  article.  More  details  of  the  processing  done  by  this  system  can  be  found  in 
any  of  the  references  ,  especially  Heldorn,  1072,  which  is  a  370-page  technical  report. 
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J.  LIBRA 

LIBRA,  the  efficiency  analysis  expert  of  the  PSI  system  (Article  02)  is  being  developed 
by  Elaine  Kant  In  conjunction  with  the  PSI  project  at  Systems  Control.  Inc.,  and  at  Stanford 
University.  The  PSI  system,  through  interaction  with  the  user,  constructs  a  very  high-level 
program  specification  called  the  program  model.  Then  LIBRA,  working  together  with  the 
PECOS  coding  expert  05,  converts  the  program  model  Into  a  target  language  implementation. 
The  PECOS  system  supplies  the  transformation  rules  that  can  convert  the  program  model  into 
various  target  language  Implementations.  Using  global  efficiency  analysis  ("global  analysis" 
la  analysis  with  access  to  the  entire  program,  as  opposed  to  only  a  local  segment),  LIBRA 
directs  and  explores  the  application  of  the  transformation  rules  so  as  to  produce  an  efficient 
implementation. 

The  transformation  process  itself  consists  of  repeated  applications  of  transformation 
rules  to  parts  of  the  program,  where  every  application  results  in  a  specification  closer  to  a 
target  language  implementation.  Each  such  application  of  a  rule  is  said  to  produce  a  partial 
implementation  or  refinement  of  the  program,  and  the  transformation  rules  are  called  refinement 
rules.  Thus  refinement  rules  applied  to  refinements  produce  further  refinements.  Because 
more  than  one  refinement  rule  may  be  applicable  to  the  same  part  of  a  refinement,  the 
transformation  process  produces  a  tree  of  possible  refinements  (the  actual  situation  is 
slightly  more  complicated  since  the  ordor  in  which  the  rules  are  applied  can  affect  the  tree 
that  Is  produced).  To  avoid  the  problem  of  combinatorial  explosion,  LIBRA  develops  only  part 
of  the  tree  A  discussion  of  the  details  of  this  process  follows. 

It  is  LIBRA'S  function  to  analyze  and  guide  the  development  of  the  refinement  tree  in 
order  to  achieve  an  efficient  implementation  IIBHA  determines  what  parts  of  the  program  to 
expand  next  and  what  parts  not  to  expand  at  all  In  particular,  when  more  than  one 
refinement  rule  is  rpplicable,  LIBRA  may  decide  to  apply  them  all  so  that  the  resulting 
refinements  can  be  considered  In  greater  detail,  or  LIBRA  may  decide  to  apply  only  one  of 
the  rules.  In  the  latter  case,  the  refinement  is  implemented  directly  in  the  current  node  of 
the  tree,  and  the  other  possibilities  are  permanently  forgone 

One  of  the  most  important  ways  in  which  LIBRA  attacks  the  problem  of  combinatorial 
explosion  is  by  estimating  the  efficiency  of  possible  target  language  implementations  For 
each  refinement  in  the  tree,  LIBRA  maintains  two  cost  estimates,  the  estimates  are  in  the 
form  of  symbolic  algebraic  expressions  that  give  the  time  and  space  requirements  needed  to 
execute  a  certain  kind  of  target  language  implementation  The  first  estimate  Is  the  default 
cost  that  might  result  If  all  the  constructs  and  operators  in  the  refinement  were  assigned 
default  implementations  The  second  Is  the  optimistic  cost  estimate  that  might  result 
assuming  (a)  certain  efficient  implementation  techniques  that  have  worked  in  similar 
situations  will  prove  succesful  in  the  present  situation,  and  (b)  LIBRA  expends  enough  of  its 
own  resources  of  time  and  space  to  carry  out  these  implementation  techniques. 

Treating  these  two  costs  as  upper  and  lower  bounds  on  the  costs  of  possible  target 
language  Implementations  of  the  refinement,  LIBRA  obtains  important  guidance  in  directing  the 
growth  of  the  refinement  tree.  These  upper  and  lower  bounds  can  be  used  to  prune  a  branch 
of  the  refinement  tree  (without  further  consideration  of  the  branch)  or  to  calculate  the 
effect  of  e  partial  implementation  decision  on  the  global  program  cost.  As  discussed  below  in 
the  RULES  section,  the  upper  end  lower  bounds  are  used  to  direct  ettention  to  high  impact 
areas,  those  areas  where  effort  is  likely  to  yield  the  greatest  increases  in  overall  efficiency. 
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Another  feature  of  the  LIBRA  system,  a  f«?ature  implicit  in  the  above  is  the  knowledge 
l  IBM  A  has  about  the  use  and  limits  of  its  own  resources  of  avaitatilo  time  and  space  Ihis 
feature  is  important  because  no  system  can  devote  unlimited  effort  to  finding  an  efficient 
implementation  Effort  must  be  allocated  The  way  in  which  IIBHA  performs  this  allocation  is 
to  assign  available  resources  to  high  impact  areas,  where  the  resources  will  do  the  most 
good  The  HUIES  section  will  present  the  method  used  to  compute  impact,  as  well  as 
examples  and  uses  of  resource  knowledge 

l  IBHA  also  includes  mechanisms  to  assist  in  the  acquisition  of  new  programming 
concepts.  When  new  high-level  constructs  are  added  (such  as  new  types  of  sorts,  or 
trees),  new  efficiency  knowledge  is  needed  to  analyze  these  concepts  (their  subparts, 
running  times,  data  structure  accesses,  and  so  on)  LIBRA  has  a  model  of  programming 
concepts  that  is  consulted  when  now  concepts  are  added  Some  of  the  necessary 
information  can  be  deduced  automatically,  and  the  user  is  asked  specific  questions  to  obtain 
the  rest  To  help  construct  these  estimation  functions.  IIBHA  provides  a  semi-automatic 
procedure  for  deriving  cost  estimation  functions  from  the  set  of  cost  functions  for  the  target 
language  constructs 

The  knowledge  for  managing  resources,  computing  upper  and  lower  cost  estimates, 
directing  attention  to  different  parts  of  the  tree,  making  implementation  decisions,  and.  in 
general,  for  analyzing  and  directing  tho  growth  of  the  tree  is  in  the  form  of  rules  Each  rule 
consists  of  a  condition  and  an  action  to  be  performed  If  the  condition  is  met  The  knowledge 
that  a  rule  expresses  can  easily  be  modified  since  the  rules  ate  replaceable  and  can  be 
added,  deleted,  or  altered  without  requiring  a  modification  to  the  system  itself 


Rules 


The  rules  In  LlBRA's  knowledge  base  generally  can  be  divided  into  three  groups 
attention  and  resource  management  rules,  plausible-implementation  rules,  and  cost-analysis 
rules 


Attention  and  resource  management  rules  describe  when  to  shift  attention  to  other 
nodes  in  the  tree  and  also  how  to  set  priorities  for  refining  the  different  constructs  and 
operations  within  a  refinement  node  Some  of  the  more  important  of  these  rules  determine 
how  LlBHA's  own  resources  of  available  time  and  space  are  to  be  allocated,  on  the  basis  of 
where  they  will  have  the  greatest  impact  One  of  the  wavs  of  determining  impact  Is  to 
consider  the  difference  between  the  upper  bound  cost  estimate  (assuming  default 
implementations)  and  the  optimistic  lower  bound  cost  estimate  (assuming  both  the  successful 
application  of  efficiency  techniques  that  have  worked  in  similar  situations  and  the  sufficient 
expenditure  of  resources  to  carry  the  techniques  to  completion)  Other  rules  in  this  group 
state  how  to  shift  attention  among  nodes  These  rules  (a)  cause  complex  programs  to  be 
expanded  early  in  order  to  see  what  decisions  are  involved,  (b)  postpone  trivial  decisions 
until  important  ones  are  made,  (c)  look  at  all  refinements  in  the  tree  and  select  for 
development  the  one  whose  optimistic  cost  estimate  la  least  (when  resources  for  developing 
a  partici  'sr  refinement  are  exhausted),  and  (d)  apply  a  form  of  branch  and  bound  which 
states  that  (when  resources  allocated  for  considering  a  particular  decision  are  exhausted) 
attention  should  be  directed  to  the  whole  tree  and  that  all  nodes  whose  optimistic  cost 
estimate  Is  worse  than  the  default  estimate  of  some  other  node  should  be  eliminated  As 
described  later,  when  cost  analysis  rules  compare  estimates,  they  take  into  account  the 
degree  of  uncertainty  in  the  estimate 


J 


LIBRA 


75 


Plausible  implementation  rules  express  heuristics  about  when  to  limit  expansion  of 
nodes,  by  making  a  decision  about  some  part  of  an  implementation  For  example,  when  tin* 
question  of  how  to  represent  a  set  first  arises.  LIBRA  performs  a  global  examination  of  the 
program  to  determine  all  uses  of  the  set  If  there  are  many  places  where  the  program  checks 
for  membership  in  the  set,  thon  a  hash-table  representation  may  be  suggested  In  general, 
plausible  Implementation  rules  express  knowledge  denvod  by  human  or  machine  analysis  of 
commonly  occurring  situations,  such  as  which  sorting  techniques  are  best  for  different  size 
Inputs  Those  rules  also  contain  heuristics  to  make  quick  decisions  Thus,  if  LIBRA  is  running 
out  of  resources,  heuristics  that  are  not  as  dependable  as  the  one  just  described  are  used 
to  make  decisions  on  the  spot,  without  creating  any  new  nodes  These  heuristics  generally 
express  defaults,  such  as  "use  lists  rather  than  arrays  if  the  target  language  is  LI5P";  they 
are  used  to  make  the  less  important  decisions  or  to  make  all  decisions  if  the  total  resources 
for  writing  a  program  are  nearly  exhausted 

The  final  group,  the  cost-analysis  rules,  express  how  to  compute,  update,  and  compare 
upper  and  lower  bound  estimates  of  the  cost  of  the  final  implementation  The  cost  estimates 
are  in  the  form  of  symbolic  algebraic  expressions  that  may  involve  variables  representing  set 
sues  The  cost  estimates  are  not  computed  once  and  for  all  Whenever  a  refinement  in  the 
tree  is  further  refined  (i  e  .  a  refinement  rule  is  applied  to  some  part  of  a  node  in  the  subtree 
whoso  root  is  tho  refinement),  then  the  cost  estimates  associated  with  the  refinement  are 
i*ii rt’nt’Mdh  updated  so  as  to  produce  estimates  that  are  more  accurate  in  view  of  the  new 
information  Cost  estimates  are  constructed  from  a  knowledge  base  that  includes  information 
on  upper  and  lower  bounds  on  costs  for  time  and  space  usage  by  individual  constructs  and 
operations,  and  on  how  to  combmo  such  cost  estimates  for  composite  programs  The 
knowledge  needed  to  incrementally  update  the  cost  estimates  is  contained  in  rules 
corresponding  to  the  particular  construct  or  operation  The  method  of  comparing  the  cost 
estimates  of  different  refinements  involves  the  addition  of  a  bonus  to  the  refinement  that 
has  a  greater  degree  of  completion  and  that  consequently  has  a  greater  certainty  in  its  cost 
estimates  (default  and  optimistic).  This  feature  favors  a  nearly  complete  refinement  that 
has  •  slightly  worse  lower  bound  over  a  less  complete  (more  abstract)  refinement  that  has  a 
s'lgRCy  better  tower  bound  Such  a  preference  is  desirable  since  the  cost  estimate  of  the 
more  abstract  refinement  is  less  certain  and  therefore  may  not  be  achievable  By  giving  a 
bonus  for  the  degreo  of  completion,  the  cost  analysis  rules  take  into  account  tho  likelihood  of 
being  able  to  achieve  the  cost  estimate. 


Stetus 

LIBRA  has  guided  the  application  of  the  PECOS  refinement  rules  to  produce  efficient 
implementation  of  several  variants  of  simple  database  retrieval,  sorting,  end  concept 
formation  programs  (see  PSI  article  for  an  example  of  a  concept  formation  program)  Current 
plans  include  extending  the  problem  area  to  Include  simple  algorithms  for  finding  prime 
numbers  and  for  reaching  nodes  in  a  graph  For  an  efficiency  expert  to  be  of  use  in  a 
complete  automatic  programming  system,  a  good  deal  more  research  Is  needed  Higher  level 
optimizations,  extended  symbolic  analysis  and  comparison  capabilities,  and  more  domain 
expertise  are  soma  obvious  extensions.  Automatic  bookkeeping  of  heuristics  and  perhaps 
even  automatic  generation  of  heuristics  from  sn  analysis  of  symbolic  cost  estimates  of  target 
language  concepts  are  some  long-range  goals.  In  order  to  write  more  complex  programs  such 
as  compilers  or  operating  systems,  more  efficiency  rules  would  have  to  be  added  to  the 
system,  rules  about  concepts  such  as  bit-packing,  machine  Interrupts,  and  multiprocessing 
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However,  even  with  such  additions,  the  efficiency  techniques  employed  by  the  LIBRA  system 
should  be  significant  in  controlling  the  problem  of  combinatorial  explosion  that  occurs  during 
the  search  for  efficient  implementations 

This  article  closes  with  the  description  of  an  example  illustrating  LIBRA'S  present 
operation  producing  a  simple  sort  program 


Example 

Suppose  that  a  SORT  is  specified  as  a  transfer  of  elements  from  a  SOURCf  sequential 
collection  to  a  TARGET  sequential  collection  that  is  ordered  by  some  relation  such  as  USS- 
THAN  After  the  application  of  some  preliminary  refinement  rules  that  do  not  require  any 
decisions  as  to  alternative  choices,  three  choice  points  remain  choosing  a  transfer  order, 
and  choosing  representations  for  SOURCE  and  for  TARGET 

Since  the  transfer  order  is  selected  as  the  most  important  decision.  LIBRA  directs 
attention  first  to  that  choice  point  A  heuristic  rule  is  applied  that  suggests  the  use  of  either 
an  insertion  sort  from  list  to  list  or  array  to  array,  or  a  selection  sort  from  list  to  array  The 
different  refinement  possibilities  are  added  to  the  tree  accordingly  Each  of  the  branches  is 
given  a  limited  amount  of  resources  and  told  to  focus  attention  only  on  the  parts  of  the 
program  directly  relevant  to  the  transfer  order  decision 

After  these  branches  are  refined  within  the  limits  of  the  assigned  resources,  the  nodes 
of  the  tree  are  compared  Branch  and  bound  does  not  eliminate  any  of  the  aternatives  here, 
but  the  insertion  branch  is  selected  as  it  has  the  best  lower  bound  (taking  into  account 
factors  related  to  uncertainty  of  estimatea). 

Refinement  then  proceeds  in  that  node  The  choice  of  a  list  or  array  representation  for 
the  TARGE  T  is  made  by  a  heuristic  that  says  that  lists  are  easier  to  manipulate  than  arrays  in 
LISP  This  heuristic  was  applied  because  much  of  the  lime  and  space  resources  allocated 
for  finding  an  implementation  had  been  consumed  in  the  above  tasks  and  a  quick  decision 
was  required  The  choice  of  a  list  representation  for  TARGET  forces  a  list  representation  for 
SOifRCf  because  of  a  suggestion  made  under  the  transfer-order  heuristic  Thereafter,  the 
refinement  process  is  basically  straightforward,  though  several  choices  of  whether  to  storo 
or  recompute  local  variables  are  made 
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